From everson at evertype.com Sat Apr 1 12:24:25 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 1 Apr 2017 19:24:25 +0200 Subject: Proposal to add standardized variation sequences for chess notation Message-ID: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Variation Sequences have been implemented for a number of symbol characters recently to make them useful for specialized purposes. Here is a proposal which solves a long-standing problem for an important set of symbols in the UCS. https://www.dropbox.com/sh/p9vga1dc2t02pqw/AABL4XwI-ZERDbnLJmvJJvtja?dl=0 Enjoy, Michael Everson From verdy_p at wanadoo.fr Sat Apr 1 14:21:36 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Apr 2017 21:21:36 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: I like these proposed border-box charcters which where clearly missing in the box-drawing set (where they exist only when they pass through the center of a cell. However, unless they are are ujssed in monoxpaced fonts, I don't think that all of them have to match the same width as the checkers cells, notaby the 2 vertical and 4 corner ones which can clearly be narrower (only the 2 horizontal ones, top or bottom, need to match the cell). Also, if a variation selector is used for a white or black square, the rendering should still extend the width the pieces drawn inside to center them in a square board cell. Pieces without these background selectors can still be using proportional width (for example in texts showing a game play positions). Note also that for draughts pieces, in French they are not called "homme" (=man) and "roi" (=king), but "pion" (=pawn) and "dame" (or "reine", both meaning "queen" in chess, draughts and card decks games: the "draught" game itself is named "dames" with the plural). Many draughts and chess players may use chess pieces to play draughts (if there's not enough king/queen in chess pieces, they can as well use other pieces except pawns). The board itself may be any suitable grid. Some will use or grains/small rocks for pawns and real money coins (white metal vs.yellow/red metal) for king/queen. In classrooms (where pieces are too frequently lost), children build their own pieces only with colored paper/carton and every player has in fact played with friends/family using such substitutes, and it is even easier and more friendly than playing now with two small smartphones/tablets with a connected app (those apps don't need Unicode encoding at all, they use their own graphics). 2017-04-01 19:24 GMT+02:00 Michael Everson : > Variation Sequences have been implemented for a number of symbol > characters recently to make them useful for specialized purposes. > > Here is a proposal which solves a long-standing problem for an important > set of symbols in the UCS. > > https://www.dropbox.com/sh/p9vga1dc2t02pqw/AABL4XwI-ZERDbnLJmvJJvtja?dl=0 > > Enjoy, > Michael Everson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Sat Apr 1 14:57:07 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Sat, 1 Apr 2017 21:57:07 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Michael Everson : > > Variation Sequences have been implemented for a number of symbol characters > recently to make them useful for specialized purposes. This is were I still suspected there was an April Fools joke coming up. > Here is a proposal which solves a long-standing problem for an important set > of symbols in the UCS. # Chesspiece on white versus Chesspiece on black variation sequences 25A1 FE00; White chessboard square; # WHITE SQUARE 25A8 FE01; Black chessboard square; # SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL 2654 FE00; Chesspiece on white; # WHITE CHESS KING 2654 FE01; Chesspiece on black; # WHITE CHESS KING 2655 FE00; Chesspiece on white; # WHITE CHESS QUEEN 2655 FE01; Chesspiece on black; # WHITE CHESS QUEEN 2656 FE00; Chesspiece on white; # WHITE CHESS ROOK 2656 FE01; Chesspiece on black; # WHITE CHESS ROOK 2657 FE00; Chesspiece on white; # WHITE CHESS BISHOP 2657 FE01; Chesspiece on black; # WHITE CHESS BISHOP 2658 FE00; Chesspiece on white; # WHITE CHESS KNIGHT 2658 FE01; Chesspiece on black; # WHITE CHESS KNIGHT 2659 FE00; Chesspiece on white; # WHITE CHESS PAWN 2659 FE01; Chesspiece on black; # WHITE CHESS PAWN 265A FE00; Chesspiece on white; # BLACK CHESS KING 265A FE01; Chesspiece on black; # BLACK CHESS KING 265B FE00; Chesspiece on white; # BLACK CHESS QUEEN 265B FE01; Chesspiece on black; # BLACK CHESS QUEEN 265C FE00; Chesspiece on white; # BLACK CHESS ROOK 265C FE01; Chesspiece on black; # BLACK CHESS ROOK 265D FE00; Chesspiece on white; # BLACK CHESS BISHOP 265D FE01; Chesspiece on black; # BLACK CHESS BISHOP 265E FE00; Chesspiece on white; # BLACK CHESS KNIGHT 265E FE01; Chesspiece on black; # BLACK CHESS KNIGHT 265F FE00; Chesspiece on white; # BLACK CHESS PAWN 265F FE01; Chesspiece on black; # BLACK CHESS PAWN 26C0 FE00; Draughts piece on white; # WHITE DRAUGHTS MAN 26C0 FE01; Draughts piece on black; # WHITE DRAUGHTS MAN 26C1 FE00; Draughts piece on white; # WHITE DRAUGHTS KING 26C1 FE01; Draughts piece on black; # WHITE DRAUGHTS KING 26C2 FE00; Draughts piece on white; # BLACK DRAUGHTS MAN 26C2 FE01; Draughts piece on black; # BLACK DRAUGHTS MAN 26C3 FE00; Draughts piece on white; # BLACK DRAUGHTS KING 26C3 FE01; Draughts piece on black; # BLACK DRAUGHTS KING ? U+25A1 and, especially, ? U+25A8 for empty fields on a board make no sense. U+25A8 always shows as diagonals from the lower left to the upper right (much like a forward slash /). Black fields are often hatched this way, but could also be shown with a solid fill ? U+25A0, a reverse diagonal fill ? U+25A7, a diamond pattern (diagonal crosshatch) ? U+25A9, a square pattern (orthogonal crosshatch) ? U+25A6, a vertical pattern ? U+25A5 or a horizontal pattern ? U+25A4. I suggest you adopt a space character instead, e.g. U+2003 Em Space or U+2001 Em Quad. 2003 FE00; White chessboard square; # EM SPACE 2003 FE01; Black chessboard square; # EM SPACE 2001 FE00; White chessboard square; # EM QUAD 2001 FE01; Black chessboard square; # EM QUAD ? U+25A1, ? U+25A2 or ? U+2B1A would also work if you wanted a minimal amount of ink but not none. 25A1 FE00; White chessboard square; # WHITE SQUARE 25A1 FE01; Black chessboard square; # WHITE SQUARE 25A2 FE00; White chessboard square; # WHITE SQUARE WITH ROUNDED CORNERS 25A2 FE01; Black chessboard square; # WHITE SQUARE WITH ROUNDED CORNERS 2B1A FE00; White chessboard square; # DOTTED SQUARE 2B1A FE01; Black chessboard square; # DOTTED SQUARE You should also evaluate a different approach altogether: 20DE FE00; Combining white chessboard square; # COMBINING ENCLOSING SQUARE 20DE FE01; Combining black chessboard square; # COMBINING ENCLOSING SQUARE 20DE FE00; Combining white background; # COMBINING ENCLOSING SQUARE 20DE FE01; Combining black background; # COMBINING ENCLOSING SQUARE Although one would need to combine it with a space character as a base for empty fields, this would require only two new entries in StandardizedVariants.txt and be more flexible regarding alternate (Fairy Chess) game pieces ? including emojis. From everson at evertype.com Sat Apr 1 15:03:31 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 1 Apr 2017 22:03:31 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: <1D03D028-D29C-4846-BCF0-0D15A7C30A2D@evertype.com> On 1 Apr 2017, at 21:21, Philippe Verdy wrote: > > I like these proposed border-box charcters which where clearly missing in the box-drawing set (where they exist only when they pass through the center of a cell. This document does not propose any new characters. Michael Everson From everson at evertype.com Sat Apr 1 15:35:59 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 1 Apr 2017 22:35:59 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Message-ID: On 1 Apr 2017, at 21:57, Christoph P?per wrote: > ? U+25A1 and, especially, ? U+25A8 for empty fields on a board make no sense. Not so. Think about the data. > U+25A8 always shows as diagonals from the lower left to the upper right (much like a forward slash /). Black fields are often hatched this way, but could also be shown with a solid fill ? U+25A0, a reverse diagonal fill ? U+25A7, a diamond > pattern (diagonal crosshatch) ? U+25A9, a square pattern (orthogonal crosshatch) ? U+25A6, a vertical pattern ? U+25A5 or a horizontal pattern ? U+25A4. The *conventional* glyph used in international chess diagrams uses the character I chose (with the /// diagonals). That?s the character which should be used to represent chess boards in plain text. Nothing prevents a font designer from choosing to render it (in a chess font supporting this protocol) with a vertical pattern, or with dots, or with as black, or whatever. Please distinguish characters from glyphs. > I suggest you adopt a space character instead, e.g. U+2003 Em Space or U+2001 Em Quad. No. Absolutely not. Spaces have a variety of properties. Spaces separate things, but are not things themselves. The white square on a chessbord is not a separating nothingness. It?s a white square. Even when a chessboard is made of green and brown marble, one is still a white square, and one a black square. Even when chess pieces are made of yellow and red plastic, one is still a white piece and one is still a black piece. In this proposal the squares and the pieces are all graphic symbols all with the So (Symbol Other) property. Using space characters you suggest would be a mistake; they have the Za (Space Separator) property. > ? U+25A1, ? U+25A2 or ? U+2B1A would also work if you wanted a minimal amount of ink but not none. Christoph, I?ve already implemented this and it works well and robustly. Glyphs could be altered in a variety of ways, but the point is that this is the kind of simple higher level protocol which will solve a long-standing problem simply and easily, and allow the parsing of chess problems as text for analysis, and allow the generation of chess problem images from other descriptions of chess problems and solutions. > 25A1 FE00; White chessboard square; # WHITE SQUARE > 25A1 FE01; Black chessboard square; # WHITE SQUARE > > 25A2 FE00; White chessboard square; # WHITE SQUARE WITH ROUNDED CORNERS > 25A2 FE01; Black chessboard square; # WHITE SQUARE WITH ROUNDED CORNERS > > 2B1A FE00; White chessboard square; # DOTTED SQUARE > 2B1A FE01; Black chessboard square; # DOTTED SQUARE Again there?s no need to use a variety of characters to represent the chess squares. If you want to draw them in your font with a certain non-/// fill, you could. But that is cosmetic and irrelevant. The point is to have the underlying board data as plain text, and that means just using the chess pieces and conventional white and black squares. Remember, the /// is drawn around the chess pieces with the FE01 but for those there is no 25A8 used. Please examine the non-OpenType plain text representations in Figure. They?re even readable by humans. > You should also evaluate a different approach altogether: > > 20DE FE00; Combining white chessboard square; # COMBINING ENCLOSING SQUARE > 20DE FE01; Combining black chessboard square; # COMBINING ENCLOSING SQUARE > > 20DE FE00; Combining white background; # COMBINING ENCLOSING SQUARE > 20DE FE01; Combining black background; # COMBINING ENCLOSING SQUARE > > Although one would need to combine it with a space character as a base for empty fields, That?s not remotely tempting. It would offer no advantage and would needlessly complicate the system. > this would require only two new entries in StandardizedVariants.txt and be more flexible regarding alternate (Fairy Chess) game pieces ? including emojis. This proposal has nothing to do with emojis. This is a plain-text protocol for the representation of chessboard data in a parseable fashion. Should fairy chess characters be added to the standard, some additional entries would be added to StandardizedVariants.txt, yes. This is a finite set, however, and this should not be problematic to anybody. It?s certainly simpler than a number of other recommended sequences which have been added to the standard for other purposes. I thank you, sincerely, for your interest in this proposal; it has been considered and tested and it works better than what you have proposed, however. I could prepare additional fonts using dotted or black glyphs for the black squares as you suggest, but the strength of this proposal is that you could achieve those glyphs by simply switching from one font to another, with the underlying chess data preserved. Michael Everson From gwalla at gmail.com Sat Apr 1 15:42:01 2017 From: gwalla at gmail.com (Garth Wallace) Date: Sat, 1 Apr 2017 13:42:01 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Message-ID: On Sat, Apr 1, 2017 at 12:57 PM, Christoph P?per < christoph.paeper at crissov.de> wrote: > Michael Everson : > > > > Variation Sequences have been implemented for a number of symbol > characters > > recently to make them useful for specialized purposes. > > This is were I still suspected there was an April Fools joke coming up. > > > Here is a proposal which solves a long-standing problem for an important > set > > of symbols in the UCS. > > # Chesspiece on white versus Chesspiece on black variation sequences > 25A1 FE00; White chessboard square; # WHITE SQUARE > 25A8 FE01; Black chessboard square; # SQUARE WITH UPPER RIGHT TO LOWER > LEFT > FILL > 2654 FE00; Chesspiece on white; # WHITE CHESS KING > 2654 FE01; Chesspiece on black; # WHITE CHESS KING > 2655 FE00; Chesspiece on white; # WHITE CHESS QUEEN > 2655 FE01; Chesspiece on black; # WHITE CHESS QUEEN > 2656 FE00; Chesspiece on white; # WHITE CHESS ROOK > 2656 FE01; Chesspiece on black; # WHITE CHESS ROOK > 2657 FE00; Chesspiece on white; # WHITE CHESS BISHOP > 2657 FE01; Chesspiece on black; # WHITE CHESS BISHOP > 2658 FE00; Chesspiece on white; # WHITE CHESS KNIGHT > 2658 FE01; Chesspiece on black; # WHITE CHESS KNIGHT > 2659 FE00; Chesspiece on white; # WHITE CHESS PAWN > 2659 FE01; Chesspiece on black; # WHITE CHESS PAWN > 265A FE00; Chesspiece on white; # BLACK CHESS KING > 265A FE01; Chesspiece on black; # BLACK CHESS KING > 265B FE00; Chesspiece on white; # BLACK CHESS QUEEN > 265B FE01; Chesspiece on black; # BLACK CHESS QUEEN > 265C FE00; Chesspiece on white; # BLACK CHESS ROOK > 265C FE01; Chesspiece on black; # BLACK CHESS ROOK > 265D FE00; Chesspiece on white; # BLACK CHESS BISHOP > 265D FE01; Chesspiece on black; # BLACK CHESS BISHOP > 265E FE00; Chesspiece on white; # BLACK CHESS KNIGHT > 265E FE01; Chesspiece on black; # BLACK CHESS KNIGHT > 265F FE00; Chesspiece on white; # BLACK CHESS PAWN > 265F FE01; Chesspiece on black; # BLACK CHESS PAWN > 26C0 FE00; Draughts piece on white; # WHITE DRAUGHTS MAN > 26C0 FE01; Draughts piece on black; # WHITE DRAUGHTS MAN > 26C1 FE00; Draughts piece on white; # WHITE DRAUGHTS KING > 26C1 FE01; Draughts piece on black; # WHITE DRAUGHTS KING > 26C2 FE00; Draughts piece on white; # BLACK DRAUGHTS MAN > 26C2 FE01; Draughts piece on black; # BLACK DRAUGHTS MAN > 26C3 FE00; Draughts piece on white; # BLACK DRAUGHTS KING > 26C3 FE01; Draughts piece on black; # BLACK DRAUGHTS KING > > ? U+25A1 and, especially, ? U+25A8 for empty fields on a board make no > sense. > U+25A8 always shows as diagonals from the lower left to the upper right > (much > like a forward slash /). Black fields are often hatched this way, but > could also > be shown with a solid fill ? U+25A0, a reverse diagonal fill ? U+25A7, a > diamond > pattern (diagonal crosshatch) ? U+25A9, a square pattern (orthogonal > crosshatch) > ? U+25A6, a vertical pattern ? U+25A5 or a horizontal pattern ? U+25A4. Technically any of those shadings would be understood (and I doubt if anyone would notice if the lines ran in the other diagonal direction), but in practice dark squares in typeset diagrams are almost invariably hatched in the bottom left to top right direction. Diagrams in image form may use solid color fill, but that's not relevant to Unicode: this proposal is meant to provide a standardized basis for the existing practice of typesetting chess diagrams in black-and-white text, not to supplant images. > I suggest you adopt a space character instead, e.g. U+2003 Em Space or > U+2001 Em > Quad. > > 2003 FE00; White chessboard square; # EM SPACE > 2003 FE01; Black chessboard square; # EM SPACE > > 2001 FE00; White chessboard square; # EM QUAD > 2001 FE01; Black chessboard square; # EM QUAD > > ? U+25A1, ? U+25A2 or ? U+2B1A would also work if you wanted a minimal > amount of > ink but not none. > > 25A1 FE00; White chessboard square; # WHITE SQUARE > 25A1 FE01; Black chessboard square; # WHITE SQUARE > > 25A2 FE00; White chessboard square; # WHITE SQUARE WITH ROUNDED CORNERS > 25A2 FE01; Black chessboard square; # WHITE SQUARE WITH ROUNDED CORNERS > > 2B1A FE00; White chessboard square; # DOTTED SQUARE > 2B1A FE01; Black chessboard square; # DOTTED SQUARE > > You should also evaluate a different approach altogether: > > 20DE FE00; Combining white chessboard square; # COMBINING ENCLOSING > SQUARE > 20DE FE01; Combining black chessboard square; # COMBINING ENCLOSING > SQUARE > > 20DE FE00; Combining white background; # COMBINING ENCLOSING SQUARE > 20DE FE01; Combining black background; # COMBINING ENCLOSING SQUARE > > Although one would need to combine it with a space character as a base for > empty > fields, this would require only two new entries in > StandardizedVariants.txt and > be more flexible regarding alternate (Fairy Chess) game pieces COMBINING ENCLOSING SQUARE already has its own uses in fairy chess problems, to mark pieces with additional properties (such as paralyzing pieces or magic pieces) or transient identities (chameleons and half-neutrals). It would not be appropriate for this purpose. > including emojis. > No chess symbols, encoded or proposed, are emoji, nor should they be. -------------- next part -------------- An HTML attachment was scrubbed... URL: From 637275 at gmail.com Sat Apr 1 16:09:55 2017 From: 637275 at gmail.com (Rebecca T) Date: Sat, 1 Apr 2017 17:09:55 -0400 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Message-ID: > No chess symbols, encoded or proposed, are emoji, nor should they be. Except on Samsung . -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Sat Apr 1 16:30:39 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Sat, 01 Apr 2017 23:30:39 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: 2654 FE00; Chesspiece on white; # WHITE CHESS KING Why do the ones with white background need a variation selector? 25A1 FE00; White chessboard square; # WHITE SQUARE 25A8 FE01; Black chessboard square; # SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL I see that you want a fallback in case the variation selectors aren't supported; but isn't the convention that one "always" start with FE00 for each character that may have variation selectors applied? So in this case, one would only need variation selector FE00; if applied to 25A1 or 25A8 giving the chess board variety, if applied to a chess piece character, gives "checkered" ("black") background (without, one gets the white background). Why not use 25A0 BLACK SQUARE with the variation selector? (I know that it would not entirely black with the variation selector (if not fallback).) I mean, there is no absolute LOGICAL NEED to draw the "black" background as WITH UPPER RIGHT TO LOWER LEFT FILL, it could go the other direction or be just "gray" (or for that matter medium blue...); font maker choice. Kind regards /Kent K From verdy_p at wanadoo.fr Sat Apr 1 16:35:33 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Apr 2017 23:35:33 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Message-ID: 2017-04-01 23:09 GMT+02:00 Rebecca T <637275 at gmail.com>: > > No chess symbols, encoded or proposed, are emoji, nor should they be. > > Except on Samsung > > . > Except that the sample in this article mixes the colors (black symbol, vs. white piece emoji, only slighjly darkened with its 3D shadows) I hope that Samsung is making a clear distinction in its emojis, otherwise it is not a replacement of the symbol, and skin color modifiers "may" have been used to. Note that the previous discussion talks about black and white patterns, but in reality the patterns are just there to emulate color or lightness/darkness. I don't think there's ny realy difference if the pattern hashes are oriented like /// or \\\, or if grid patterns are rotated 0? 30?, 45?: these patterns are used to get a visual feeling as the exact number of stroke is not significant (only the visual black vs/white coverate rate is significant and high resolution devices may freely use thinner stroke widths depending on pixel/subpixel sizes, optical filters or ink droplets/powders sizes and absorbtion/diffusion by the printing support). The same appllies too for human skin color modifiers. On typical color (or grey) displays or polychrmatic printing, these patterns will not be used, real colorized fills will be used for more clarity. Today's printing techniques use much higher precision, and patterns used on old books or maps are no longer needed (and paper surface quality/regularity today is much better than what it was in the past, even for basic newspapers using cheap recycled paper, where polychromatic printing is also used, notably on pages related to leasure time, games, TV programs, wheather maps, photos of celebrities... and adertizing! Printing masks are generated using hi resolution lasers and ink quality is much better too). -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Sat Apr 1 16:49:30 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 1 Apr 2017 23:49:30 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: I think it's all about sizing so that white or black cells will align, independantly of the piece that may be within it. However if FE00 and FE01 give the background color distinction, why is the base different (25A1 vis. 25A8) for the empty cell, when it is no different for the same piece (in the same white or black color) ? I see the separation of the base only for the borders (touching outside the checkers board), which may also be reduced to a minimal thin edge over a small margin... or nothing at all (completely transparent) if the font already includes a thin contrasting cell on the squares (e.g. grey ridges with 3D effects). The outer backdound on which these borders are drawn may also be already contrasting with another color (yellow, green, blue), and checkers may also use other pairs of contrasting colors (e.g. beige/ivery vs. brown): The FE00 and FE01 select an Emoji style with more freedom in shapes and colors for the piece and more precise and coherent sizes but a required square cell. Their absence just means an isolated piece outside the checker board and without required backgrounds or without monospaced margins, suitable for inclusion in text. 2017-04-01 23:30 GMT+02:00 Kent Karlsson : > > 2654 FE00; Chesspiece on white; # WHITE CHESS KING > > Why do the ones with white background need a variation selector? > > 25A1 FE00; White chessboard square; # WHITE SQUARE > 25A8 FE01; Black chessboard square; # SQUARE WITH UPPER RIGHT TO LOWER LEFT > FILL > > I see that you want a fallback in case the variation selectors aren't > supported; but isn't the convention that one "always" start with FE00 > for each character that may have variation selectors applied? > > So in this case, one would only need variation selector FE00; if applied > to 25A1 or 25A8 giving the chess board variety, if applied to a chess piece > character, gives "checkered" ("black") background (without, one gets the > white background). > > Why not use 25A0 BLACK SQUARE with the variation selector? (I know that > it would not entirely black with the variation selector (if not fallback).) > I mean, there is no absolute LOGICAL NEED to draw the "black" background > as WITH UPPER RIGHT TO LOWER LEFT FILL, it could go the other direction > or be just "gray" (or for that matter medium blue...); font maker choice. > > Kind regards > /Kent K > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Sat Apr 1 16:57:06 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Sat, 01 Apr 2017 23:57:06 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: In addition, not directly related to your proposal, why aren't chess pieces listed in http://unicode.org/emoji/charts/emoji-variants.html. It seems to me that chess pieces would be very well suited to have each an emoji variant (not to be used for the chess boards, maybe). /Kent K PS Remember that Emoji style (or not) uses two OTHER variation selectors, FE0F (and FE0E). From everson at evertype.com Sat Apr 1 18:31:19 2017 From: everson at evertype.com (Michael Everson) Date: Sun, 2 Apr 2017 01:31:19 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <2AB2D979-2370-4E95-897D-4D499472B4B2@evertype.com> Kent, Please do not drag chess pieces into discussions about emoji right now. Do it later if you must. This proposal is designed to proved a standardized higher-level protocol for the use of chess characters in chess data, to enable the chess community to make good use of the long-encoded chess characters. Michael > On 1 Apr 2017, at 23:57, Kent Karlsson wrote: > > In addition, not directly related to your proposal, why aren?t chess pieces listed in http://unicode.org/emoji/charts/emoji-variants.html. > > It seems to me that chess pieces would be very well suited to have each an emoji variant (not to be used for the chess boards, maybe). > > /Kent K > > PS > Remember that Emoji style (or not) uses two OTHER variation selectors, FE0F (and FE0E). > From everson at evertype.com Sat Apr 1 18:33:14 2017 From: everson at evertype.com (Michael Everson) Date: Sun, 2 Apr 2017 01:33:14 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <2B915B48-06DB-4C80-AEEE-BE255D2DD407@evertype.com> On 1 Apr 2017, at 23:30, Kent Karlsson wrote: > 2654 FE00; Chesspiece on white; # WHITE CHESS KING > > Why do the ones with white background need a variation selector? Because for the typesetting to work the glyph has to have the same precise square metrics as the ones on the black square (it is not a ?background?), and the chess characters when used as ordinary symbols in text need not have such metrics. (And do not, in most fonts.) > 25A1 FE00; White chessboard square; # WHITE SQUARE > 25A8 FE01; Black chessboard square; # SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL > > I see that you want a fallback in case the variation selectors aren?t supported; I am not sure what you mean. If the variation selector isn?t supported then the glyph will not have metrics suitable for setting a chessboard. > but isn't the convention that one "always" start with FE00 for each character that may have variation selectors applied? I don?t know what you mean by this. As shown in Figure 2, a white knight for instance may occur on its own, or may occur on a white board square or on a black board square. I don?t think the first need differentiation, which is why the variation sequences apply only to the on?board-square glyphs. > So in this case, one would only need variation selector FE00; if applied to 25A1 or 25A8 giving the chess board variety, if applied to a chess piece character, gives "checkered" ("black") background (without, one gets the white background). No, a chesspiece symbol can (and nearly always does) appear on its own in text without square metrics. ?Being on a white square? is a specific glyph state, different from ?being a symbol on its own?. > Why not use 25A0 BLACK SQUARE with the variation selector? (I know that it would not entirely black with the variation selector (if not fallback).) Because the conventional international shading for a black square is the /// one, and using that facilitates legibility in environments where OpenType features are not enabled even if the VS characters are present. > I mean, there is no absolute LOGICAL NEED to draw the "black? background as WITH UPPER RIGHT TO LOWER LEFT FILL, it could go the other direction or be just "gray" (or for that matter medium blue...); font maker choice. Since it doesn?t ?matter" what character is used I chose the one which is most typical, and stand by that choice. All the best, Michael Everson > Kind regards > /Kent K From gwalla at gmail.com Sat Apr 1 19:50:16 2017 From: gwalla at gmail.com (Garth Wallace) Date: Sat, 1 Apr 2017 17:50:16 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Message-ID: On Sat, Apr 1, 2017 at 2:09 PM, Rebecca T <637275 at gmail.com> wrote: > > No chess symbols, encoded or proposed, are emoji, nor should they be. > > Except on Samsung > > . > They do not *officially* have emoji presentation. Samsung does what it wants. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Sat Apr 1 20:16:39 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Sun, 02 Apr 2017 03:16:39 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <2B915B48-06DB-4C80-AEEE-BE255D2DD407@evertype.com> Message-ID: Den 2017-04-02 01:33, skrev "Michael Everson" : >> but isn't the convention that one "always" start with FE00 for each character >> that may have variation selectors applied? > > I don?t know what you mean by this. > 25A8 FE01; Black chessboard square; # SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL In this case, the "set of variation selectors" for 25A8 excludes FE00. /Kent K From everson at evertype.com Sun Apr 2 01:19:00 2017 From: everson at evertype.com (Michael Everson) Date: Sun, 2 Apr 2017 08:19:00 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: > On 1 Apr 2017, at 23:49, Philippe Verdy wrote: > > I think it's all about sizing so that white or black cells will align, independantly of the piece that may be within it. A white knight may stand alone in text, in which case no variation exists for display beyond the base glyph in the font. A white knight may need to be represented with specific em-square-related metrics within the font, in two variations, one with no fill on the background of the em-square indicating the piece on a white square, and one with a (typically ///-shaped) fill indicating the piece on a black square. > However if FE00 and FE01 give the background color distinction, why is the base different (25A1 vis. 25A8) for the empty cell, when it is no different for the same piece (in the same white or black color) ? In the second row in Figure 3, a chessboard is given without any variation selectors at all. This is not beautiful presentation but it is nevertheless legible in plain text. This is more advantageous to the user than having e.g. WHITE SQUARE display both as white with em-square square metrics and as hatched with em-square metrics. > I see the separation of the base only for the borders (touching outside the checkers board), The characters used for the horizontal and vertical borders and corners are optional and do not require variation selectors. In a chess font they only require to be drawn with the appropriate metrics to match up with the board squares. > which may also be reduced to a minimal thin edge over a small margin... or nothing at all (completely transparent) if the font already includes a thin contrasting cell on the squares (e.g. grey ridges with 3D effects). The outer backdound on which these borders are drawn may also be already contrasting with another color (yellow, green, blue), and checkers may also use other pairs of contrasting colors (e.g. beige/ivery vs. brown): Those proposal is about black and white glyphs for ordinary printing of plain text with appropriate font glyphs in the conventional way of displaying chessboard data. In lead type, a white knight in a white square was a separate character from a white knight on a black square. This proposal uses variation selectors to select such glyphs, preserving character identity of chess pieces as already encoded. The alternative would be to encode *WHITE CHEESE KNIGHT ON BLACK SQUARE which when mooted in the past was rejected. > The FE00 and FE01 select an Emoji style NO, THEY DO NOT. > with more freedom in shapes and colors for the piece and more precise and coherent sizes but a required square cell. Their absence just means an isolated piece outside the checker board and without required backgrounds or without monospaced margins, suitable for inclusion in text. The proposal does no more nor less than it says it does. It has been carefully thought out and tested in font implementation, typeset as you see in the proposal in a program which respects the OpenType features. Michael Everson From richard.wordingham at ntlworld.com Sun Apr 2 03:53:22 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 2 Apr 2017 09:53:22 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> Message-ID: <20170402095322.17526d87@JRWUBU2> On Sun, 2 Apr 2017 08:19:00 +0200 Michael Everson wrote: > > On 1 Apr 2017, at 23:49, Philippe Verdy wrote: > > > > I think it's all about sizing so that white or black cells will > > align, independantly of the piece that may be within it. > > A white knight may stand alone in text, in which case no variation > exists for display beyond the base glyph in the font. > > A white knight may need to be represented with specific > em-square-related metrics within the font, in two variations, one > with no fill on the background of the em-square indicating the piece > on a white square, and one with a (typically ///-shaped) fill > indicating the piece on a black square. I'm uneasy about the semantics of the sequences. To take the extreme example, would be the white bishop on a (or 'with its') white square while would be the white bishop on a black square. Perhaps someone can show me evidence from mathematical symbols or Japanese kanji that such semantic modifications are perfectly acceptable. I have related unease about the glyph of , intended for a white rook on a black square, being used in text where the meaning is 'white rook' regardless of what square it is on. If these variation sequences are acccepted, I hope that the intention that they contribute to producing presentable populated chess boards in plain text will be captured in at least the Unicode Standard. I can see issues with line-spacing, which I believe is formally out of control in true plain text. If my unease is well-founded, then I think we have a case for two combining marks akin to U+20E3 COMBINING ENCLOSING KEYCAP. Unfortunately, that would not be as simple to use (or define) as the proposed variation sequences. I'm also bothered by the purposes of the Format 14 'cmap' subtable. For each supported variation selector, it has a direct and an indirect mapping of base character to glyph. The direct mapping maps the character to the glyph to be used when qualified by the variation selector; that makes perfect sense. The indirect mapping gives a list of characters for which the 'default' glyph is to be used, i.e. the cmap subtables that do not directly support variation selectors are to be used. As I understand it, this list will in general not include all the characters supported by the font but without a non-default variation sequence mapping. Thus if a font supports the use of U+E0100 VARIATION SELECTOR-17, the table for U+E0100 will not have mention of U+0030, for is not a standardised variation sequence, and therefore no font should support it. I believe the purpose of having the indirect mapping is so that one can query whether a font explicitly supports a sequence. Thus the cmap distinguishes two cases: 1) is explicitly catered for, and used the same glyph as unqualified U+82A6. 2) is not catered for. If one needs to be sure of having its distinctive form, another font must be used. If I have understood the intended use correctly, then we need another variation sequence to explicitly specify a glyph of U+2656 suitable for use in plain-looking running text, analogous to for a text-style '2'. A renderer can then ask whether a font supports plain text white rooks, as opposed to providing one dimensioned for assembling chess boards. Richard. From everson at evertype.com Sun Apr 2 04:57:49 2017 From: everson at evertype.com (Michael Everson) Date: Sun, 2 Apr 2017 11:57:49 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170402095322.17526d87@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> Message-ID: <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> On 2 Apr 2017, at 10:53, Richard Wordingham wrote: >> A white knight may need to be represented with specific em-square-related metrics within the font, in two variations, one with no fill on the background of the em-square indicating the piece on a white square, and one with a (typically ///-shaped) fill indicating the piece on a black square. > > I'm uneasy about the semantics of the sequences. There isn?t any, not really. This isn?t a problem. > To take the extreme example, would be the white bishop on a (or ?with its') white square No, ?on a? is correct. It?s a display mode. The semantics (such as they are) of a white knight have to do with its position in the 8 x 8 chessboard matrix (which can be parsed in plain text, which is why this proposal is useful in terms of chessboard data). In Figure 3, even in the inadequately formatted examples the plain text arrangement of the squares and chess characters can be read and understood. Proper formatting requires shadowing when a chess piece is on a black square, but whether that square is (in algebraic notation) c6 or d7 is a matter of the matrix, not the font display. This proposal permits a regular > while would be the white bishop on a black square. Perhaps someone can show me evidence from mathematical symbols or Japanese kanji that such semantic modifications are perfectly acceptable. There isn?t a semantic modification. It?s a graphic modification, just like Even the emoji VSes regulate display, not semantics. Thus: 2194 ? LEFT RIGHT ARROW = z notation relation ? 2194 FE0E text style ? 2194 FE0F emoji style This is a matter of display, of font glyph selection. > I have related unease about the glyph of , intended for a white rook on a black square, being used in text where the meaning is 'white rook' regardless of what square it is on. That sequence selects a glyph in the font which draws the white rook surrounded by the diagonal lines which indicate a black square. It has no semantics but what you read into it. (OK, it ?means" the rook could logically be on a1, c1, e1, g1, b2, d2, f2, h2, a3, c3, e3, g3, b4, d4, f4, h4 etc. ? but this is nothing to feel uneasy about. And what, pray, is the alternative? A full matrix like this: ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????? ? is a set of characters which can be parsed. It?s text which can be parsed, and because it uses Unicode characters (rather than the ASCII and Symbol font hacks described in ?2 of the proposal) it can be sent and received with fidelity and can be displayed nicely with a conformant font. > If these variation sequences are acccepted, I hope that the intention that they contribute to producing presentable populated chess boards in plain text will be captured in at least the Unicode Standard. My intention would be to re-format the proposal document as a UTN for guidance to implementors, if that?s what you mean. > I can see issues with line-spacing, which I believe is formally out of control in true plain text. So is the font rendering. The board which I have pasted in above in this e-mail doesn?t look great in Everson Mono (which is what I use to view my plain-text e-mail) ? because I haven?t added the sequences to that font yet ? but it is legible. And I can cut and paste it into another document where I have more font control. Yes, some control over line-spacing might be needed in some environments for optimum results. THat?s why in the proposal says things like "set in Ludus in 24 points with 26-point leading? where relevant. > If my unease is well-founded, then I think we have a case for two combining marks akin to U+20E3 COMBINING ENCLOSING KEYCAP. Unfortunately, that would not be as simple to use (or define) as the proposed variation sequences. We could add some *COMBINING WHITE GAME SQUARE FILTER and *COMBINING BLACK GAME SQUARE FILTER, but this does not simplify matters. First, you would have to decide what base character to use for the squares on which no characters stand. I think that the proposed 25A1 WHITE SQUARE and 25A8 SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL make better sense because in environments where the OpenType features cannot be supplied the plain text is still legible, if not beautiful. Your suggestion is not going to alter the burden on the font with regard to display. > I'm also bothered by the purposes of the Format 14 'cmap' subtable. I took this text from a successful proposal dealing with variation selectors by Ken Lunde of Adobe. I am not attached to it. To me, the instruction: sub uni2654 uniFE00 by uni2654FE00 ; sub uni2654 uniFE01 by uni2654FE01 ; is all that I implemented, and the result was what was expected. > For each supported variation selector, it has a direct and an indirect mapping of base character to glyph. The direct mapping maps the character to the glyph to be used when qualified by the variation selector; that makes perfect sense. The indirect mapping gives a list of characters for which the 'default' glyph is to be used, i.e. the cmap subtables that do not directly support variation selectors are to be used. As I understand it, this list will in general not include all the characters supported by the font but without a non-default variation sequence mapping. Thus if a font supports the use of U+E0100 VARIATION SELECTOR-17, the table for U+E0100 will not have mention of U+0030, for is not a standardised variation sequence, and therefore no font should support it. Um, I don?t understand a word of what you?ve said here, whatever you mean by ?direct mapping? and ?indirect mapping". All I know is that I used OpenType rules in my font to get sequences to point to certain glyphs, and the result works as intended. You can see this in the proposal document. > I believe the purpose of having the indirect mapping is so that one can query whether a font explicitly supports a sequence. Thus the cmap distinguishes two cases: > > 1) is explicitly catered for, and used the same glyph as unqualified U+82A6. I?m not using E0100. And I?m not using CJK character ?. I did not propose sequences for unqualified chess pieces because I didn?t see any reason why there should be a benefit for it. If there is some genuine benefit, obviously the sequences in my proposal could be altered from 2654 FE00; Chesspiece on white; # WHITE CHESS KING 2654 FE01; Chesspiece on black; # WHITE CHESS KING (that is: sub uni2654 uniFE00 by uni2654FE00 ; sub uni2654 uniFE01 by uni2654FE01 ;) to 2654 FE00; Unqualified chesspiece; # WHITE CHESS KING 2654 FE01; Chesspiece on white; # WHITE CHESS KING 2654 FE02; Chesspiece on black; # WHITE CHESS KING (that is: sub uni2654 uniFE00 by uni2654 ; sub uni2654 uniFE01 by uni2654FE02 ; sub uni2654 uniFE02 by uni2654FE01 ;) But I didn?t see any need for that, since 2654 is already the unqualified chesspiece. If there?s a formal need for triplets rather than couplets here, I?ll conform to it, but that seems to be incidental to the robustness of the proposal. > 2) is not catered for. If one needs to be sure of having its distinctive form, another font must be used. > > If I have understood the intended use correctly, then we need another variation sequence to explicitly specify a glyph of U+2656 suitable for use in plain-looking running text, analogous to for a text-style '2'. A renderer can then ask whether a font supports plain text white rooks, as opposed to providing one dimensioned for assembling chess boards. If a font doesn?t support a glyph or a sequence, then operating systems substitute other glyphs or the .notdef glyph or whatever, no? Michael Everson From verdy_p at wanadoo.fr Sun Apr 2 05:54:48 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sun, 2 Apr 2017 12:54:48 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> Message-ID: 2017-04-02 11:57 GMT+02:00 Michael Everson : > I?m not using E0100. And I?m not using CJK character ?. I did not propose > sequences for unqualified chess pieces because I didn?t see any reason why > there should be a benefit for it. If there is some genuine benefit, > obviously the sequences in my proposal could be altered from > > 2654 FE00; Chesspiece on white; # WHITE CHESS KING > 2654 FE01; Chesspiece on black; # WHITE CHESS KING > > (that is: > > sub uni2654 uniFE00 by uni2654FE00 ; > sub uni2654 uniFE01 by uni2654FE01 ;) > > to > > 2654 FE00; Unqualified chesspiece; # WHITE CHESS KING > 2654 FE01; Chesspiece on white; # WHITE CHESS KING > 2654 FE02; Chesspiece on black; # WHITE CHESS KING > > (that is: > > sub uni2654 uniFE00 by uni2654 ; > sub uni2654 uniFE01 by uni2654FE02 ; > sub uni2654 uniFE02 by uni2654FE01 ;) > > But I didn?t see any need for that, since 2654 is already the unqualified > chesspiece. If there?s a formal need for triplets rather than couplets > here, I?ll conform to it, but that seems to be incidental to the robustness > of the proposal. > > > 2) is not catered for. If one needs to be sure of > having its distinctive form, another font must be used. > > > > If I have understood the intended use correctly, then we need another > variation sequence to explicitly specify a glyph of U+2656 suitable for use > in plain-looking running text, analogous to for a text-style > '2'. A renderer can then ask whether a font supports plain text white > rooks, as opposed to providing one dimensioned for assembling chess boards. > > If a font doesn?t support a glyph or a sequence, then operating systems > substitute other glyphs or the .notdef glyph or whatever, no? > > Semantically, using variation selectors for this usage seems a bit strange for me: you are adding a semantic for the "on a cell" which also affects the metrics and placement of the piece (to center it within the checkboard cell). What is represented is then BOTH a chess piece (such as 2654), AND a checkboard cell (in your example you took 25A1 WHITE SQUARE but if its metrics is appropriate for use in plain text, its margins are inappropriate for use in a checkboard where cells should be touching without any margin). There's still no reliable way to represent the empty cells except by adding a variation selector on the 25A1 WHITE SQUARE to transform it into a true cell. Then how to add the chess piece in it ? in Unicode we traditionally use joinder controls to suggest a ligature. This would then produce something like: <25A1, VS-1, ZWJ, 2654>, the first part before ZWJ for the cell itself. You are promoting a simpler encoding using pairs by encoding separate variants of the pieces themselves (two variants for the "on white cell" and "on black cell") but this is still not consistant for the empty cells: do you accept 00A0 NBSP to represent the absence of piece so that <00A0 FE00> and <00A0 FE01> will correctly represent the colored cells ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sun Apr 2 11:27:10 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 2 Apr 2017 17:27:10 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> Message-ID: <20170402172710.54c37ad2@JRWUBU2> On Sun, 2 Apr 2017 11:57:49 +0200 Michael Everson wrote: > On 2 Apr 2017, at 10:53, Richard Wordingham > wrote: > > while would be the white bishop on a black square. > > Perhaps someone can show me evidence from mathematical symbols or > > Japanese kanji that such semantic modifications are perfectly > > acceptable. > There isn?t a semantic modification. It?s a graphic modification, > just like Even the emoji VSes regulate display, not semantics. Thus: > > 2194 ? LEFT RIGHT ARROW > = z notation relation > ? 2194 FE0E text style > ? 2194 FE0F emoji style > > This is a matter of display, of font glyph selection. We seem to agree that it should be a graphic modification, rather than as semantic modification. The question I pose is, "Is it just a graphic modification in this case?". I'm not convinced that it is. A player starts with two non-interchangeable bishops. could only refer the white bishop that is restricted to black squares. That's a semantic difference. > > If these variation sequences are acccepted, I hope that the > > intention that they contribute to producing presentable populated > > chess boards in plain text will be captured in at least the Unicode > > Standard. > My intention would be to re-format the proposal document as a UTN for > guidance to implementors, if that?s what you mean. > > I can see issues with line-spacing, which I believe is formally out > > of control in true plain text. > > So is the font rendering. The immediate parallel that comes to mind is the ideographic square. A sequence of CJK ideographs should be a monospace sequence - and that is the major point of most of the ASCII clones with 'IDEOGRAPHIC' or 'FULLWIDTH' in their names. The uniform width is a key part of the semantic of the seqeunces being discussed. > We could add some *COMBINING WHITE GAME SQUARE FILTER and *COMBINING > BLACK GAME SQUARE FILTER, but this does not simplify matters. First, > you would have to decide what base character to use for the squares > on which no characters stand. I think that the proposed 25A1 WHITE > SQUARE and 25A8 SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL make > better sense because in environments where the OpenType features > cannot be supplied the plain text is still legible, if not beautiful. U+00A0 makes a lot of sense as the base character. Also having variants of U+25A1 and U+25A8 that match the game square filter modifiers seems quite legitimate. Possible lack of OpenType support is supposed not to be an admissible justification. > Your suggestion is not going to alter the burden on the font with > regard to display. My suggestion actually increases it. I suggested it because it seems to be the proper thing to do. Variation sequences seem to be the easier solution - provided they are supported in the first place. > to > > 2654 FE00; Unqualified chesspiece; # WHITE CHESS KING > 2654 FE01; Chesspiece on white; # WHITE CHESS KING > 2654 FE02; Chesspiece on black; # WHITE CHESS KING > > (that is: > > sub uni2654 uniFE00 by uni2654 ; > sub uni2654 uniFE01 by uni2654FE02 ; > sub uni2654 uniFE02 by uni2654FE01 ;) > > But I didn?t see any need for that, since 2654 is already the > unqualified chesspiece. If there?s a formal need for triplets rather > than couplets here, I?ll conform to it, but that seems to be > incidental to the robustness of the proposal. It's an incidental detail, but if needed someone will have to attend to it. U+2654 is simply the chesspiece; a font that only had variants for white and 'black' backgrounds could nominate either as the glyph for U+2654 on its own. > > 2) is not catered for. If one needs to be sure > > of having its distinctive form, another font must be used. > > > > If I have understood the intended use correctly, then we need > > another variation sequence to explicitly specify a glyph of U+2656 > > suitable for use in plain-looking running text, analogous to > > for a text-style '2'. A renderer can then ask > > whether a font supports plain text white rooks, as opposed to > > providing one dimensioned for assembling chess boards. > > If a font doesn?t support a glyph or a sequence, then operating > systems substitute other glyphs or the .notdef glyph or whatever, no? No. First of all, the substitution mechanism is usually above the operating system layer, with varying degrees of application control. Secondly, the mechanism can only look for a substitute if it knows that the glyph is missing. If it's looking for an OpenType font for a glyph of the family , the obvious mechanism is to consult the cmap format 14 subtable. The font gives no indication of what glyph families the font's default rendering of U+82A6 is supposed to belong to. Richard. From asmusf at ix.netcom.com Sun Apr 2 12:43:39 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 2 Apr 2017 10:43:39 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170402172710.54c37ad2@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> Message-ID: <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sun Apr 2 12:52:51 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 2 Apr 2017 18:52:51 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> Message-ID: <20170402185251.58b95878@JRWUBU2> On Sun, 2 Apr 2017 11:57:49 +0200 Michael Everson wrote: > THat?s why in the proposal says things like "set in Ludus in > 24 points with 26-point leading? where relevant. You forgot the most important setting though - that the higher-order protocols allow symbols to be displayed left-to-right. If the direction should happen to be right-to-left, not only is the game mirrored, but the board edges don't work properly, as the glyphs are not mirrored. One needs each bidi-paragraph to be forced to the correct order, e.g. by use of LRM before and after, or, if the board is recorded right-to-left, RLM or ALM before and after. Richard. From christoph.paeper at crissov.de Sun Apr 2 18:21:04 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 3 Apr 2017 01:21:04 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <1766481648.24541.1491076627544.JavaMail.open-xchange@app08.ox.hosteurope.de> Message-ID: <247062319.25007.1491175265004.JavaMail.open-xchange@app06.ox.hosteurope.de> Michael Everson : > On 1 Apr 2017, at 21:57, Christoph P?per wrote: > > > ? U+25A1 and, especially, ? U+25A8 for empty fields on a board make no > > sense. > > Not so. Think about the data. I do, but I'm thinking about the character, too. > Please distinguish characters from glyphs. I do. To draw board diagrams, you need a character for "field with no game piece on it". You *do not* need two characters, "white field" and "black field"! It's just a (very strong) convention to draw every other field in a different color; they are also distinguished by coordinates A1 through H8. That is why I agree with your proposal to use variation sequences for chess and checkers pieces. The background color poses no semantic difference, except for the bishops perhaps. I'm only suggesting you apply the same logic to empty fields. I evidently don't know which existing character serves that role best, but I strongly believe it should be a single one, not a pair chosen for their glyphs. FE00; White chessboard square; # FE01; Black chessboard square; # instead of FE00; White chessboard square; # FE00; Black chessboard square; # > The white square on a chessbord is not a separating nothingness. It?s a white > square. Unless there are Fairy Chess boards that have adjacent squares of the same color or three different colors with arbitrary distribution, white and black are just optional visual cues for alternating fields. You might argue that using separate whitish and blackish square characters for empty fields provides for better fallback rendering, but the pieces will have no background and possibly render proportionally, too. > The point is to have the underlying board data as plain text, and that means > just using the chess pieces and conventional white and black squares. No, this approach would properly require alternate code points for all chess pieces, just with a different background, like legacy fonts provide. > > 20DE FE00; Combining white chessboard square; # COMBINING ENCLOSING SQUARE > > 20DE FE01; Combining black chessboard square; # COMBINING ENCLOSING SQUARE > > That?s not remotely tempting. It would offer no advantage and would needlessly > complicate the system. I meant the proposal should explain why this approach would be worse. > > this would require only two new entries in StandardizedVariants.txt and be > > more flexible regarding alternate (Fairy Chess) game pieces See? I provided two advantages. > > ? including emojis. > > This proposal has nothing to do with emojis. Maybe I should have included a winking smiley here. > Should fairy chess characters be added to the standard, some additional > entries would be added to StandardizedVariants.txt, yes. > This is a finite set, however, and this should not be problematic to anybody. With a Combining character, fellow Fairy Chess inventors would not be limited to the characters added specifically for this purpose. If they wanted to introduce a Dragon piece, for instance, they could use U+1F432-FE0E-20DE-FE00/1 to represent it. From duerst at it.aoyama.ac.jp Mon Apr 3 05:31:07 2017 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Mon, 3 Apr 2017 19:31:07 +0900 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170402172710.54c37ad2@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> Message-ID: On 2017/04/03 01:27, Richard Wordingham wrote: > We seem to agree that it should be a graphic modification, rather than > as semantic modification. The question I pose is, "Is it just a > graphic modification in this case?". I'm not convinced that it is. A > player starts with two non-interchangeable bishops. > could only refer the white bishop that is restricted to black squares. > That's a semantic difference. That applies only to the bishop, and only in standard chess and those chess variants that keep the same restrictions. It's easily possible to imagine or invent variants where bishops can move differently, and it would be weird to use a semantic difference (e.g. different characters) for bishops, but a variant selector for other pieces. Also it would be weird to try e.g. to "semantically" distinguish the two rooks, even if they are two different actual chess pieces on an actual board. > The immediate parallel that comes to mind is the ideographic square. A > sequence of CJK ideographs should be a monospace sequence - and that is > the major point of most of the ASCII clones with 'IDEOGRAPHIC' or > 'FULLWIDTH' in their names. The uniform width is a key part of the > semantic of the seqeunces being discussed. The full width/half width distinction mostly is a legacy (roundtrip) issue. Regards, Martin. From verdy_p at wanadoo.fr Mon Apr 3 06:11:50 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 3 Apr 2017 13:11:50 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> Message-ID: 2017-04-03 12:31 GMT+02:00 Martin J. D?rst : > Also it would be weird to try e.g. to "semantically" distinguish the two > rooks, even if they are two different actual chess pieces on an actual > board. > However it is perfectly possible to have pseudo-variants using pieces and an annotation on them such as numbers/letters/symbols added on top of them: with combining marks? As well empty checkboard cells may contain some marks, not just pieces. -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Apr 3 07:12:52 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 14:12:52 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170402172710.54c37ad2@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> Message-ID: On 2 Apr 2017, at 18:27, Richard Wordingham wrote: > We seem to agree that it should be a graphic modification, rather than as semantic modification. Yes, we do. > The question I pose is, "Is it just a graphic modification in this case?". Yes, it is. > I'm not convinced that it is. A player starts with two non-interchangeable bishops. could only refer the white bishop that is restricted to black squares. That's a semantic difference. Surely not. If it were, we would encode WHITE BISHOP THAT STAYS ON THE WHITE SQUARES and WHITE BISHOP THAT STAYS ON BLACK SQUARES and we would encode WHITE KNIGHT THAT MOVES FROM WHITE SQUARES TO BLACK SQUARES and WHITE KNIGHT THAT MOVES FROM BLACK SQUARES TO WHITE SQUARES. > The immediate parallel that comes to mind is the ideographic square. A sequence of CJK ideographs should be a monospace sequence - and that is the major point of most of the ASCII clones with 'IDEOGRAPHIC? or 'FULLWIDTH' in their names. The uniform width is a key part of the semantic of the seqeunces being discussed. I think you are seriously going the wrong way with this thinking. The immediate parallel that comes to mind are things like: 1000 ? MYANMAR LETTER KA ? 1000 FE00 ? dotted form where the character can still be read if the variation selector?s glyph can?t be shown. Uniform width is a feature of CJK, sure, but that?s the nature of the writing system. Chess pieces for setting withing in ordinary text do NOT have to be an em wide, and they don?t in fonts. Chess pieces on a white square or on a black square do have to have a uniform width in order to produced the board matrix. > U+00A0 makes a lot of sense as the base character. What? NBSP and SP are whitespace characters, with complex behaviours, and chessboards, whether set in lead type or digitally, are sets of simple symbol glyphs. NBSP glues two things together. SP separates things. Chessboards are not collections of black squares glued together by white spaces with white spaces at the alternating ends of lines. I reject this analysis. > Also having variants of U+25A1 and U+25A8 that match the game square filter modifiers seems quite legitimate. Um, wait? What are you proposing NBSP for? I'm confused now. If you like these two characters (and I am glad you do) there?s no need for U+00A0 at all. > Possible lack of OpenType support is supposed not to be an admissible justification. Well, I addressed this in the proposal. OpenType support for the symbol + VS sequences gives the desired result. A board prepared using this encoding proposal is legible even if not beautiful, but is nevertheless parseable, and in my view is a robust and convenient higher-level protocol which is certainly superior to the chaos that currently besets the chess community, who can?t even reliably interchange chessboard data using their ASCII fonts due to the plethora of encodings still in use. (None of the chess fonts I have examined use the Unicode chess characters at all.) >> Your suggestion is not going to alter the burden on the font with regard to display. > > My suggestion actually increases it. I suggested it because it seems to be the proper thing to do. I can?t agree. > Variation sequences seem to be the easier solution - provided they are supported in the first place. It is understood that not all environments may display such ligatures, but that?s true for every character that uses a variation sequence. >> 2654 FE00; Unqualified chesspiece; # WHITE CHESS KING >> 2654 FE01; Chesspiece on white; # WHITE CHESS KING >> 2654 FE02; Chesspiece on black; # WHITE CHESS KING >> >> (that is: >> >> sub uni2654 uniFE00 by uni2654 ; >> sub uni2654 uniFE01 by uni2654FE02 ; >> sub uni2654 uniFE02 by uni2654FE01 ;) >> >> But I didn?t see any need for that, since 2654 is already the >> unqualified chesspiece. If there?s a formal need for triplets rather >> than couplets here, I?ll conform to it, but that seems to be >> incidental to the robustness of the proposal. > > It's an incidental detail, but if needed someone will have to attend to it. U+2654 is simply the chesspiece; a font that only had variants for white and 'black' backgrounds could nominate either as the glyph for U+2654 on its own. No, again, it?s not right to say that chess pieces on their own have to be the width of an em square, and this would disrupt their use in ordinary text. Here are the metrics for the pieces in Ludus: >> If a font doesn?t support a glyph or a sequence, then operating systems substitute other glyphs or the .notdef glyph or whatever, no? > > No. > > First of all, the substitution mechanism is usually above the operating system layer, with varying degrees of application control. Well, yes, OpenType is handled by the font and by the app knowing that the OpenType tables are there. > Secondly, the mechanism can only look for a substitute if it knows that the glyph is missing. The macOS does this quite reliably. If Baskerville has no chess piece, but Ludus does, then a text in Baskerville wlll usually display the Ludus glyph. You can override this by selecting the Ludus gyph and forcing it back to Baskerville and then you get a box or other substitution glyph. > If it's looking for an OpenType font for a glyph of the family , Or any OpenType substitution string. > the obvious mechanism is to consult the cmap format 14 subtable. The font gives no indication of what glyph families the font's default rendering of U+82A6 is supposed to belong to. I don?t really find us in disagreement?. Michael Everson From everson at evertype.com Mon Apr 3 07:42:43 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 13:42:43 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> Message-ID: <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> On 2 Apr 2017, at 19:43, Asmus Freytag wrote: > It's a matter of perspective. > > Higher-level semantic constructs are encoded in writing (or graphic notation), and you can see the individual marks, signs, letters and symbols as the element of this encoding. However, how strongly any of these marks, signs, letters and symbols are associated with a specific semantic, and how fixed that association is, depends on convention. > Asmus, I don?t follow this abstraction of yours. The proposal is simple. The proposal works when OpenType substitutions of ?piece? plus ?VS? are in the font and when an app can display such a substitution. > For example, "left arrow" has a very loose associating with a broad range of concepts that somehow relate to direction. > In contrast, "integral sign" is rarely associated with any concept outside calculus. And chess piece characters are symbols which mean chess pieces. > It's tempting then, to assume that the character for "integral sign" somehow directly represents the semantic of "integration" --- except it doesn't. > > The same indirection is at play here. This is pure rhetoric, Asmus. It addresses the problem in no way. > My dislike for using variation sequences in the way Micheal appear to advocate is based on a different reason: This is almost funny. Ordinarily I dislike variation sequences because I consider them pseudo-encoding. > the oft-stated fact that variation selectors may be ignored. I?m aware of this. I may be wrong, but I believe you advocated for the encoding of variation sequences for mathematics purposes. > If they are, any plain text that depends on the contrasting use of white and black chess background will become meaningless gibberish. This is untrue. Did you not read the proposal? Look again at Figure 3. In the left hand column, the top example, which is only one of the several AsCII-based ways that chess fonts represent chessboards today (without any Unicode chess characters at all). It is legible only if that particular font is loaded. The middle example in the same column is not very good looking. But it is stable, parseable, exchangeable data which gives unique tokens for the empty squares in two colours and which contains the chess characters. It?s not ?meaningless gibberish? and it?s not even very difficult to read. Same for the bottom example, which has been force-justified to facilitate legibility; while that font has visible glyphs for the variation selectors, it needn?t. > In these cases, explicit encoding would better cover what is desired: a reliable way to mark a distinction between different symbols (the two bishops are separate symbols, that also happen to express distinct, though related concepts -- it is not a single symbol with some ignorable attributes). Well, Asmus, if by "explicit encoding? you mean ?add more chess characters? this would require the trebling of the number of basic chess characters from 12 to 36. You couldn?t get away with adding just six chesspieces-on-black because then fonts would be forced to draw all the chesspieces-on-white with the same em-square metrics needed to produce chessboards. But that would mean that nobody could use the ordinary chess pieces as just symbols in plain text (as seen in Figures 6 and 8). I do not believe that burdening chess users with having to use different fonts for in-text characters on the one hand and board-layout on the other is a good idea, particularly when both forms of presentation are the norm in chess-problem publishing. Further, it would delay implementation of a chessboard solution till the summer of 2019 for no benefit, since the proposal here is simple to implement with nothing more than care on the part of the font designer. And when in the past encoding pieces-on-black has been suggested, the answer has been: no, use a higher-level protocol. This proposal is a robust and simple higher-level protocol. It enables the preparation of parseable chessboards without having to add characters, or without the problem of having pieces-for-use-in-text looking nearly identical to pieces-for-use-on-white-squares. > Now, for the case of suggesting the chess-board cell dimensions, I do not have the same objection to the use of variation selectors. If the variation selectors get stripped, the text may require manual formatting to look correct, but it will still contain the correct symbols (and applying the chosen convention, you will be able to know which bishop is meant). > > That's much closer to the way variation selectors are intended to be used. What? You are very unclear here. Are you saying that the empty white and black squares should use VS but the chess pieces are not? That makes no sense to me at all. Michael Everson From everson at evertype.com Mon Apr 3 07:50:06 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 13:50:06 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170402185251.58b95878@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402185251.58b95878@JRWUBU2> Message-ID: <46C3463F-ECCB-4F85-BA78-F73BD7F70A66@evertype.com> On 2 Apr 2017, at 18:52, Richard Wordingham wrote: > > You forgot the most important setting though - that the higher-order protocols allow symbols to be displayed left-to-right. If the direction should happen to be right-to-left, not only is the game mirrored, but the board edges don't work properly, as the glyphs are not mirrored. One needs each bidi-paragraph to be forced to the correct order, e.g. by use of LRM before and after, or, if the board is recorded right-to-left, RLM or ALM before and after. None of the characters listed in ?3 has a mirroring property. Michael Everson From kent.karlsson14 at telia.com Mon Apr 3 09:41:58 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Mon, 03 Apr 2017 16:41:58 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <46C3463F-ECCB-4F85-BA78-F73BD7F70A66@evertype.com> Message-ID: Den 2017-04-03 14:50, skrev "Michael Everson" : > On 2 Apr 2017, at 18:52, Richard Wordingham > wrote: >> >> You forgot the most important setting though - that the higher-order >> protocols allow symbols to be displayed left-to-right. If the direction >> should happen to be right-to-left, not only is the game mirrored, but the >> board edges don't work properly, as the glyphs are not mirrored. One needs >> each bidi-paragraph to be forced to the correct order, e.g. by use of LRM >> before and after, or, if the board is recorded right-to-left, RLM or ALM >> before and after. > > None of the characters listed in ?3 has a mirroring property. Right, but most of them have bidi property ON (other neutral), so in a right-to-left context, the chess board characters will be reversed (on each line, but the VSs (which are NSM) still go with their base). This would 1) mirror the chess *board* display (but not the chess *piece* glyphs) 2) mess up the corner glyphs, which are not mirrored; and also the RIGHT/LEFT ONE EIGHTH BLOCK glyphs, which aren't mirrored either. Issue 2 will result in ugly display. Issue 1 will confuse the reader, mirroring the entire chess board (if one disregards the ugly display of the corner and left/right borders). Hence the chess board lines should be displayed in a strong left-to-right context (either via bidi markup characters, or via some higher order bidi markup mechanism, such as the "bidi" attribute in HTML). Though in most cases (not Arabic/Hebrew/... document), the bidi context will default to left-to right... For cut-and-paste to work well also when pasting to a right-to-left context document, bidi markup characters are probably better than using a higher-level attribute. I think that is why Richard argues for using bidi characters to make the lines strong left-to-right (without having to surround each chess board line with visible strong l-t-r characters). You might argue for making the board corner and board left/right border characters strong l-t-r. Not sure if that would sit well with the UTC... /Kent K From gerrietm at icloud.com Mon Apr 3 02:12:51 2017 From: gerrietm at icloud.com (Gerriet M. Denkmann) Date: Mon, 3 Apr 2017 14:12:51 +0700 Subject: Combining Class of Thai Nonspacing_Marks Message-ID: <1D352E5A-C506-4DC4-8F91-4E0100522384@icloud.com> The Combining Class is used for normalisation of strings. Normalisation of strings is important for filenames in filesystems. As far as I know, a Thai consonant (Lo, Other_Letter) can have several Nonspacing_Marks. This cluster of nonspacing marks can contain at most one top/bottom vowel and at most one tone/other mark. There is no syntactically meaning in the order of these nonspacing marks. So: All top/bottom vowels should have Combining Class 103, all tone/other marks have Combining Class 107. Is there a reason for having top vowels or other-marks with Combining Class 0, Not_Reordered? With the current choice of Combining Class both consonant + mark + top vowel and consonant + top vowel + mark are normalised, so that one can have two files with these (identically looking, but different) names, which is rather confusing. Here a list of all nonspacing marks in the Thai script: top vowels (Combining Class 0, Not_Reordered): ? this seems to be wrong; should be 103 THAI CHARACTER MAI HAN-AKAT ? THAI CHARACTER SARA I ? THAI CHARACTER SARA II ? THAI CHARACTER SARA UE ? THAI CHARACTER SARA UEE ? bottom vowels (Combining Class 103): THAI CHARACTER SARA U ? THAI CHARACTER SARA UU ? tone-marks (Combining Class 107): THAI CHARACTER MAI EK ? THAI CHARACTER MAI THO ? THAI CHARACTER MAI TRI ? THAI CHARACTER MAI CHATTAWA ? other-marks (Combining Class 0, Not_Reordered): ? this seems to be wrong, should be 107 THAI CHARACTER MAITAIKHU ? THAI CHARACTER THANTHAKHAT ? THAI CHARACTER NIKHAHIT ? THAI CHARACTER YAMAKKAN ? other-marks (Combining Class 9, Virama) THAI CHARACTER PHINTHU ? Gerriet. From asmusf at ix.netcom.com Mon Apr 3 11:16:11 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Apr 2017 09:16:11 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> Message-ID: <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> On 4/3/2017 5:42 AM, Michael Everson wrote: Read to the end. > On 2 Apr 2017, at 19:43, Asmus Freytag wrote: > >> It's a matter of perspective. >> >> Higher-level semantic constructs are encoded in writing (or graphic notation), and you can see the individual marks, signs, letters and symbols as the element of this encoding. However, how strongly any of these marks, signs, letters and symbols are associated with a specific semantic, and how fixed that association is, depends on convention. >> Asmus, I don?t follow this abstraction of yours. The proposal is simple. The proposal works when OpenType substitutions of ?piece? plus ?VS? are in the font and when an app can display such a substitution. >> For example, "left arrow" has a very loose associating with a broad range of concepts that somehow relate to direction. >> In contrast, "integral sign" is rarely associated with any concept outside calculus. > And chess piece characters are symbols which mean chess pieces. > >> It's tempting then, to assume that the character for "integral sign" somehow directly represents the semantic of "integration" --- except it doesn't. >> >> The same indirection is at play here. > This is pure rhetoric, Asmus. It addresses the problem in no way. Actually it does. I'm amazed that you don't see the connection. > >> My dislike for using variation sequences in the way Micheal appear to advocate is based on a different reason: > This is almost funny. Ordinarily I dislike variation sequences because I consider them pseudo-encoding. > >> the oft-stated fact that variation selectors may be ignored. > I?m aware of this. I may be wrong, but I believe you advocated for the encoding of variation sequences for mathematics purposes. Yes, for those cases where the differences are known to not carry meaning, but where duplicating all fonts or duplicating the characters would have been the wrong solution to allow support for both conventions (e.g. upright vs. slanted integral signs, details of relational operator design, etc.). > >> If they are, any plain text that depends on the contrasting use of white and black chess background will become meaningless gibberish. > This is untrue. Did you not read the proposal? Look again at Figure 3. In the left hand column, the top example, which is only one of the several AsCII-based ways that chess fonts represent chessboards today (without any Unicode chess characters at all). It is legible only if that particular font is loaded. The middle example in the same column is not very good looking. But it is stable, parseable, exchangeable data which gives unique tokens for the empty squares in two colours and which contains the chess characters. It?s not ?meaningless gibberish? and it?s not even very difficult to read. Same for the bottom example, which has been force-justified to facilitate legibility; while that font has visible glyphs for the variation selectors, it needn?t. > >> In these cases, explicit encoding would better cover what is desired: a reliable way to mark a distinction between different symbols (the two bishops are separate symbols, that also happen to express distinct, though related concepts -- it is not a single symbol with some ignorable attributes). > Well, Asmus, if by "explicit encoding? you mean ?add more chess characters? this would require the trebling of the number of basic chess characters from 12 to 36. You couldn?t get away with adding just six chesspieces-on-black because then fonts would be forced to draw all the chesspieces-on-white with the same em-square metrics needed to produce chessboards. But that would mean that nobody could use the ordinary chess pieces as just symbols in plain text (as seen in Figures 6 and 8). I do not believe that burdening chess users with having to use different fonts for in-text characters on the one hand and board-layout on the other is a good idea, particularly when both forms of presentation are the norm in chess-problem publishing. > > Further, it would delay implementation of a chessboard solution till the summer of 2019 for no benefit, since the proposal here is simple to implement with nothing more than care on the part of the font designer. > > And when in the past encoding pieces-on-black has been suggested, the answer has been: no, use a higher-level protocol. > > This proposal is a robust and simple higher-level protocol. It enables the preparation of parseable chessboards without having to add characters, or without the problem of having pieces-for-use-in-text looking nearly identical to pieces-for-use-on-white-squares. > >> Now, for the case of suggesting the chess-board cell dimensions, I do not have the same objection to the use of variation selectors. If the variation selectors get stripped, the text may require manual formatting to look correct, but it will still contain the correct symbols (and applying the chosen convention, you will be able to know which bishop is meant). >> >> That's much closer to the way variation selectors are intended to be used. > What? You are very unclear here. Are you saying that the empty white and black squares should use VS but the chess pieces are not? That makes no sense to me at all. I'm saying that perhaps it would be appropriate to select M-square glyph variants via a variation selector. That seems a clear-cut glyph *variation* to me. (If this variation is ignored, then the text looks bad, but in a way that is similar to selecting the wrong font - which is a rule-of-thumb way of evaluating whether variation selectors are appropriate). The distinction between white/black background might be of a different nature. If you have arranged everything in a grid with the correct matrix, then the color of the background is perhaps redundant, given that there is a uniform convention for it. If you assume the characters will ever be used outside a full grid, then that assumption fails and it will not be possible to restore the intended meaning if the variation selectors are missing. That's a warning flag, that they may not be appropriate for that use. That's all. A./ > > Michael Everson > From asmusf at ix.netcom.com Mon Apr 3 11:18:28 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Apr 2017 09:18:28 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> Message-ID: <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Mon Apr 3 11:40:05 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Apr 2017 09:40:05 -0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: <1D352E5A-C506-4DC4-8F91-4E0100522384@icloud.com> References: <1D352E5A-C506-4DC4-8F91-4E0100522384@icloud.com> Message-ID: <6f999b27-654d-1220-e710-cc170de82a37@ix.netcom.com> On 4/3/2017 12:12 AM, Gerriet M. Denkmann wrote: > The Combining Class is used for normalisation of strings. > Normalisation of strings is important for filenames in filesystems. The same issues apply to network identifiers. > > As far as I know, a Thai consonant (Lo, Other_Letter) can have several Nonspacing_Marks. > This cluster of nonspacing marks can contain at most one top/bottom vowel and at most one tone/other mark. > There is no syntactically meaning in the order of these nonspacing marks. > > So: All top/bottom vowels should have Combining Class 103, all tone/other marks have Combining Class 107. > > Is there a reason for having top vowels or other-marks with Combining Class 0, Not_Reordered? > > With the current choice of Combining Class both consonant + mark + top vowel and consonant + top vowel + mark are normalised, so that one can have two files with these (identically looking, but different) names, which is rather confusing. It is not possible to construct a set of secure network identifiers based on simply a) ensuring the string is in NFC b) otherwise allowing all of the Thai characters (insofar as the they are PVALID in IDNA 2008 [RFC5892]). Considerable attention to allowable contexts is required. There is a group in Thailand working on this, but their results have not yet been made public. Similar work for Khmer and Lao can be found here: https://www.icann.org/en/system/files/files/proposal-khmer-lgr-15aug16-en.pdf https://www.icann.org/en/system/files/files/proposal-lao-lgr-31jan17-en.pdf A./ > > Here a list of all nonspacing marks in the Thai script: > > top vowels (Combining Class 0, Not_Reordered): ? this seems to be wrong; should be 103 > THAI CHARACTER MAI HAN-AKAT ? > THAI CHARACTER SARA I ? > THAI CHARACTER SARA II ? > THAI CHARACTER SARA UE ? > THAI CHARACTER SARA UEE ? > > bottom vowels (Combining Class 103): > THAI CHARACTER SARA U ? > THAI CHARACTER SARA UU ? > > tone-marks (Combining Class 107): > THAI CHARACTER MAI EK ? > THAI CHARACTER MAI THO ? > THAI CHARACTER MAI TRI ? > THAI CHARACTER MAI CHATTAWA ? > > other-marks (Combining Class 0, Not_Reordered): ? this seems to be wrong, should be 107 > THAI CHARACTER MAITAIKHU ? > THAI CHARACTER THANTHAKHAT ? > THAI CHARACTER NIKHAHIT ? > THAI CHARACTER YAMAKKAN ? > > other-marks (Combining Class 9, Virama) > THAI CHARACTER PHINTHU ? > > Gerriet. > > > From markus.icu at gmail.com Mon Apr 3 12:51:15 2017 From: markus.icu at gmail.com (Markus Scherer) Date: Mon, 3 Apr 2017 10:51:15 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> Message-ID: It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for the board layout (e.g., via a table), board frame style, and cell/field shading. In each field, the existing characters should suffice. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Mon Apr 3 13:13:01 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 3 Apr 2017 19:13:01 +0100 (BST) Subject: Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)) In-Reply-To: <11364706.56745.1491240615392.JavaMail.root@webmail12.bt.ext.cpcloud.co.uk> References: <11364706.56745.1491240615392.JavaMail.root@webmail12.bt.ext.cpcloud.co.uk> Message-ID: <7794797.61788.1491243181509.JavaMail.defaultUser@defaultHost> Peter Constable wrote: > William, you completely miss the point: As long as Unicode is the way to provide emoji to consumers, their needs and desires will not be best or fully met. Unicode as an AND gate is too many AND gates. Ah, I understand what you mean now. In my feedback of 7 March 2017 to PRI #348 on the Length of Tag Sequences I included the following. quote .... for example, a vector glyph in a platform-independent colour-font-style contour format could be expressed using tag characters. end quote Following your post and my now understanding your meaning I have written some notes about the above possibility. Previously I have made some colour fonts using the High-Logic FontCreator program. I do not claim to be expert on the OpenType colour font format, yet I know about the idea of having several glyphs with each such glyph being of one colour and then combining them to produce a colourful glyph and I also know about the option to include a default monochrome glyph. I enjoy trying to devise encoding systems, so I have tried to produce a way to send the information for a colourful glyph within a tag sequence. I am thinking that a future email or text message reception system could decode the tag sequence and add a colourful glyph to the font being used to display the message. This method, if people can get it to work satisfactorily, would allow custom vector glyph emoji within an interoperable plain text system. Here is a transcript of what I have produced so far. Readers of this thread are invited to have a look at the idea and are welcome to try to implement it if they so choose. If any additions are needed, or indeed if any changes are needed, please say. There needs to be a way so that the tag sequence for the glyph for a particular character is only sent once in a message even though the character may be used more than once in the message. Tags and custom vector glyph emoji Some notes as at Monday 3 April 2017 19:04 pm British Summer Time A tag sequence for this purpose starts with a capital letter V standing for vector format. At the start of the sequence a:=255; b:=0; g:=0; m:=1; p:=0; r:=0; x:=0; y:=0; w:=1000; At the start of the sequence the points buffer is empty, the contours buffer is empty and the glyphs buffer is empty. The system uses a special-purpose virtual computing engine within a software sandbox. The special-purpose virtual computing engine has no commands for loops and is a single pass interpretative system. ---- Letters that are each used both as a command and also as the name of a register in the special-purpose virtual computing engine. a means {a:=p; p:=0; m:=0;} b means {b:=p; p:=0; m:=0;} g means {g:=p; p:=0; m:=0;} m means {m:=1;} p means {p:=0;} r means {r:=p; p:=0; m:=0;} x means {x:=p; p:=0;} y means {y:=p; p:=0;} w means {w:=p; p:=0;} ---- Letters that are used as a command but not as the name of a register in the special-purpose virtual computing engine. c means {define a closed contour from the points in the points buffer; clear the points buffer ready for the next point; x:=0; y:=0; p:=0;} d means {define a glyph from the contour or contours in the contours buffer, if m=1 then the the glyph is the first glyph and is the monochrome glyph, else the glyph is of colour (r, g, b, a) and is not the first glyph; clear the contours buffer ready for the next glyph;clear the points buffer ready for the next point; a:=255; b:=0; g:=0; r:=0; x:=0; y:=0; p:=0; m:=0;} The use of the m register is so that a default monochrome glyph may optionally be included as the first glyph defined. If any component of the colour or opacity is defined before a d command is used, then the monochrome component is left empty. f means {define an off curve point using x and y; x:=0; y:=0; p:=0;} h means {define a complete glyph of advance width w from the glyph or glyphs in the glyphs buffer and have it ready for access by the main program; halt;} n means {define an on curve point using x and y; x:=0; y:=0; p:=0;} ---- Digits Digits 0 .. 9 each mean p:=10*p + (digit); The system is designed to be notionally for an emoji glyph within a virtual space of (x from 0 .. 1000 and y from 0 .. 1000). These values may be scaled to fit with the metrics of a real world font with which a glyph communicated using this system is applied. ---- A tag sequence for this purpose ends with a cancel tag. ---- Some basic examples of parts of a tag sequence to provide an idea of how the system would be used. The following part of a tag sequence would set the x register to have the value 250. 250x The following part of a tag sequence would define an on-curve point at (x,y) = (250, 900) 250x900yn The following part of a tag sequence would define a contour. 250x900yn800x500yf250x100ync The following part of a tag sequence would define a colour glyph that has one contour. 250x900yn800x500yf250x100ync255b128gd ---- Conclusion It seems that it would be possible for such a system to work, though the tag sequences would be quite long, yet the system could allow a colourful glyph to be expressed in an interoperable plain text format without needing any file attachment to the plain text sequence. William Overington Monday 3 April 2017 From kent.karlsson14 at telia.com Mon Apr 3 13:46:17 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Mon, 03 Apr 2017 20:46:17 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-03 19:51, skrev "markus.icu at gmail.com" : > It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for the > board layout (e.g., via a table), board frame style, and cell/field shading. > In each field, the existing characters should suffice. > > markus True, and one can easily find an example online. Slightly modified from http://stackoverflow.com/questions/18505921/chess-using-tables

True, and one can easily find an example online.

Slightly modified from http://stackoverflow.com/questions/18505921/chess-using-tables

-------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Apr 3 14:19:43 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 3 Apr 2017 20:19:43 +0100 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: <1D352E5A-C506-4DC4-8F91-4E0100522384@icloud.com> References: <1D352E5A-C506-4DC4-8F91-4E0100522384@icloud.com> Message-ID: <20170403201943.28d8e721@JRWUBU2> On Mon, 3 Apr 2017 14:12:51 +0700 "Gerriet M. Denkmann" wrote: > The Combining Class is used for normalisation of strings. > Normalisation of strings is important for filenames in filesystems. > > As far as I know, a Thai consonant (Lo, Other_Letter) can have > several Nonspacing_Marks. This cluster of nonspacing marks can > contain at most one top/bottom vowel and at most one tone/other mark. > There is no syntactically meaning in the order of these nonspacing > marks. You're confusing the modern Thai language with the Thai script. It seems that the Lao-style usage of NIKHAHIT as a vowel is known from older Thai writing, and when used this way it could of course take a tone mark. It also seems that the pressure to have both MAITAIKHU and a tone mark on a consonant has been accepted for at least one minority language. > So: All top/bottom vowels should have Combining Class 103, all > tone/other marks have Combining Class 107. > Is there a reason for having top vowels or other-marks with Combining > Class 0, Not_Reordered? It does one make one wonder if someone hated Thais. It would have been a lot simpler, and have worked better, if the combining classes for Latin diacritics had been used. As it is, one common combination of vowel below and mark above was catered for - SARA U/UU with tone mark. The system doesn't even cater for SARA U + THANTHAKHAT, as in ??????????? 'Phanthip'. The use of values peculiar to Thai (103 and 107) does not help when minority languages use Latin diacritics, such as U+0331 COMBINING MACRON BELOW and U+0303 COMBINING TILDE for Pattani Malay. The viramas that were recognised were given combining class 9; YAMAKKAN and THANTHAKHAT were overlooked. One of the looming problem is that several languages use a combination of PHINTHU and SARA I - both orders are used, though they are not canonically equivalent. Richard. From richard.wordingham at ntlworld.com Mon Apr 3 14:33:55 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 3 Apr 2017 20:33:55 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> Message-ID: <20170403203355.6cbfc184@JRWUBU2> On Sun, 2 Apr 2017 10:43:39 -0700 Asmus Freytag wrote: > In these cases, explicit encoding would better cover what is desired: > a reliable way to mark a distinction between different symbols (the > two bishops are separate symbols, that also happen to express > distinct, though related concepts -- it is not a single symbol with > some ignorable attributes). There was no intention to encode the bishops separately. It just happens that the rules of chess allow one to distinguish the bishops simply by recording the colour of the square they are currently on. The basic text elements in the scheme other than boundary markers will be: empty white square empty black square white square with specific piece on it black square with specific piece on it. If the variation selectors are ignored, these simplify to: white square hatched square specific piece This preserves all the information; the pattern of squares is known in advance and therefore redundant. Richard. From asmusf at ix.netcom.com Mon Apr 3 14:58:46 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Apr 2017 12:58:46 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403203355.6cbfc184@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> Message-ID: An HTML attachment was scrubbed... URL: From olopierpa at gmail.com Mon Apr 3 15:04:54 2017 From: olopierpa at gmail.com (Pierpaolo Bernardi) Date: Mon, 3 Apr 2017 22:04:54 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403203355.6cbfc184@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> Message-ID: On Mon, Apr 3, 2017 at 9:33 PM, Richard Wordingham wrote: > On Sun, 2 Apr 2017 10:43:39 -0700 > Asmus Freytag wrote: > >> In these cases, explicit encoding would better cover what is desired: >> a reliable way to mark a distinction between different symbols (the >> two bishops are separate symbols, that also happen to express >> distinct, though related concepts -- it is not a single symbol with >> some ignorable attributes). > > There was no intention to encode the bishops separately. It just > happens that the rules of chess allow one to distinguish the bishops > simply by recording the colour of the square they are currently on. The rules of chess don't allow this. While at the start of a game there are two bishops per player with this property, there are ways to obtain more bishops. One player, say, can have four bishops all of them on light squares. This does not happen (usually :) in chess *games*, but it may happen in problems, puzzles, and retroanalysis. Even in standard games, it's not forbidden by the rules, so it's wrong to assume it can't happen. From verdy_p at wanadoo.fr Mon Apr 3 15:42:43 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 3 Apr 2017 22:42:43 +0200 Subject: Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)) In-Reply-To: <7794797.61788.1491243181509.JavaMail.defaultUser@defaultHost> References: <11364706.56745.1491240615392.JavaMail.root@webmail12.bt.ext.cpcloud.co.uk> <7794797.61788.1491243181509.JavaMail.defaultUser@defaultHost> Message-ID: 2017-04-03 20:13 GMT+02:00 William_J_G Overington : > > A tag sequence for this purpose starts with a capital letter V standing > for vector format. > > At the start of the sequence a:=255; b:=0; g:=0; m:=1; p:=0; r:=0; x:=0; > y:=0; w:=1000; > > At the start of the sequence the points buffer is empty, the contours > buffer is empty and the glyphs buffer is empty. > > The system uses a special-purpose virtual computing engine within a > software sandbox. The special-purpose virtual computing engine has no > commands for loops and is a single pass interpretative system. > > [...] What you are describing is reinventing the wheel, notably basically what SVG paths already define. But an amji is not just a path, it has also colors for fill them, stroke styles (may be converted to fill-only paths by computing the infered geometries), smoothing effects (color shades when they are not necessarily uniform). There are attempts to create a superset of SVG paths to represent it in more compact form with additional instructions, they are used to create "subroutines" or shared forms, affine transforms, geometric derivations (line width, dashes, bevel or rounded join types), and color masks (possibly with repeated patterns, and alpha transparencies). Every attempt to extend this has become a nightmare because there were too many objectives to follow. Finally eveyone uses SVG directly, even if this is currently XML encoded. More successful representation use JSON instead of XML, without breaking the extensibility. Font encoding technologies define their own system using multiple tables and a compact dictionnary of tables with binary encoding, not suitable for inclusion in plain-text. Note also that Emojis could be animated when rendered on screen (that's what we already see in many implementations using GIF icons for their emojis, even if they are not easily resizable). Animated SVG for now is still in beta but starts being used on some sites and rendered by web browsers. SVG images may also be scripted and may include accessbility feature (e.g. with sound played or hint bubbles displayed when hovering them). You only cover a part of what is needed but hope that someone will invest time to implet it in a renderer: developers prefer investing time in SVG renderers or existing font technologies for OpenType (SVG fonts will come later when it will be capable of doing the same things as OpenType, for now it does not cover all the existing needs). -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Apr 3 16:03:48 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 3 Apr 2017 22:03:48 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> Message-ID: <20170403220348.3efb4d1a@JRWUBU2> On Mon, 3 Apr 2017 14:12:52 +0200 Michael Everson wrote: > On 2 Apr 2017, at 18:27, Richard Wordingham > wrote: > I think you are seriously going the wrong way with this thinking. The > immediate parallel that comes to mind are things like: > > 1000 ? MYANMAR LETTER KA > ? 1000 FE00 ? dotted form > > where the character can still be read if the variation selector?s > glyph can?t be shown. Uniform width is a feature of CJK, sure, but > that?s the nature of the writing system. Chess pieces for setting > withing in ordinary text do NOT have to be an em wide, and they don?t > in fonts. Chess pieces on a white square or on a black square do have > to have a uniform width in order to produced the board matrix. Nobody said the glyphs for use in ordinary text had to be a fixed width. What I am saying is that the glyphs for the two new variants you are proposing need to harmonise with the block elements such as U+2581 LOWER ONE EIGHTH BLOCK. That requires uniform width *for those variants*. That is a key part of the glyph family's essence. There is no such requirement on the glyphs for normal text use as at present. > > U+00A0 makes a lot of sense as the base character. > > What? NBSP and SP are whitespace characters, with complex behaviours, > and chessboards, whether set in lead type or digitally, are sets of > simple symbol glyphs. NBSP glues two things together. SP separates > things. Chessboards are not collections of black squares glued > together by white spaces with white spaces at the alternating ends of > lines. I reject this analysis. If one had a row of squares in flowing text, one would want the row to act like a word. One might have to resort to gluing it together using CGJ or WJ. > > Also having variants of U+25A1 and U+25A8 that match the game > > square filter modifiers seems quite legitimate. > > Um, wait? What are you proposing NBSP for? I'm confused now. If you > like these two characters (and I am glad you do) there?s no need for > U+00A0 at all. To be pedantic, I said that the proposed variants were legitimate, not that I liked them. > > Secondly, the mechanism can only look for a substitute if it knows > > that the glyph is missing. > The macOS does this quite reliably. If Baskerville has no chess > piece, but Ludus does, then a text in Baskerville wlll usually > display the Ludus glyph. You can override this by selecting the Ludus > gyph and forcing it back to Baskerville and then you get a box or > other substitution glyph. I'm talking about looking for a U+2654 glyph for ordinary text when all the first font tried has is: 2654 FE01; Chesspiece on white; # WHITE CHESS KING 2654 FE02; Chesspiece on black; # WHITE CHESS KING I must confess I am now wondering what the format 4 cmap should say about U+2654. Should it give a glyph for U+2654 or not? I'm also wondering about Windows behaviour. There was a time when Windows 7 only supported variation sequences if they appeared in the cmap 14 subtable. > > If it's looking for an OpenType font for a glyph of the family > > , > > Or any OpenType substitution string. Most won't be recognised as needed. If the first font lacks a ligature for , fallback won't be used for it. Grapheme clusters and variation sequences get special treatment. Richard. From richard.wordingham at ntlworld.com Mon Apr 3 16:12:59 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 3 Apr 2017 22:12:59 +0100 Subject: Unicode 10.0 Legitimacy of 0031 FE0E 20E3 Message-ID: <20170403221259.40b3a1cb@JRWUBU2> Where in the draft databases for Unicode 10.0 is Unicode 9.0 variation sequence declared legitimate? Without such a declaration, a font that had a special glyph for or a substitution specific to would not be Unicode compliant. I hope this reflects my ignorance of the definition system rather than an error in the databases. Richard. From everson at evertype.com Mon Apr 3 16:15:16 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 22:15:16 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> Message-ID: <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> On 3 Apr 2017, at 17:16, Asmus Freytag wrote: >>> The same indirection is at play here. >> This is pure rhetoric, Asmus. It addresses the problem in no way. > Actually it does. I'm amazed that you don't see the connection. I?ve never understood you when you back up into that particular kind of abstract rhetoric. >>> the oft-stated fact that variation selectors may be ignored. >> I?m aware of this. I may be wrong, but I believe you advocated for the encoding of variation sequences for mathematics purposes. > > Yes, for those cases where the differences are known to not carry meaning, but where duplicating all fonts or duplicating the characters would have been the wrong solution to allow support for both conventions (e.g. upright vs. slanted integral signs, details of relational operator design, etc.). The ?meaning? of a chess-problem matrix is the whole 8 ? 8 board, not the empty dark square at b4 or the white pawn on The ?problem? the higher-level protocol is supposed to solve is the one where a chess piece of one colour sits in an em-squared zone whether light or dark. In lead type this was a glyph issue. Lead type had just exactly what my proposal has: A piece with in-line text metrics, spaced harmoniously with digits and letters, and square sorts with and without hatching. Standardized variation sequences are the best way to achieve this simply and without needless duplication. :-) >> Are you saying that the empty white and black squares should use VS but the chess pieces are not? That makes no sense to me at all. > > I'm saying that perhaps it would be appropriate to select M-square glyph variants via a variation selector. That seems a clear-cut glyph *variation* to me. (If this variation is ignored, then the text looks bad, but in a way that is similar to selecting the wrong font - which is a rule-of-thumb way of evaluating whether variation selectors are appropriate). OK, then you support the part of the proposal that applies VS1 and VS2 to the chess pieces. > The distinction between white/black background might be of a different nature. If you have arranged everything in a grid with the correct matrix, then the color of the background is perhaps redundant, given that there is a uniform convention for it. Yes but you still want it to be reasonably legible when the OpenType ligatures fail. I think that this: ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????? is far better than this: ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ??????????????????<< Is it the pawn or the queen that?s on the black square? ?????????????????? ?????????? See? To parse this one you have to remember which of the white squares are the alternating black ones. I don?t consider that as legible as using both 25A1 and 25A8. The colour of the matrix is NOT redundant for a human reader. > If you assume the characters will ever be used outside a full grid, then that assumption fails and it will not be possible to restore the intended meaning if the variation selectors are missing. That's a warning flag, that they may not be appropriate for that use. You can?t assume that they wouldn?t be. All of my examples in ?2 of the proposal are in fact outside of a full grid. I think the proposal as it stands ticks the most boxes. (I have changed ?black square? and ?white square? to ?dark square? and ?light square? however. Michael Everson From everson at evertype.com Mon Apr 3 16:16:04 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 22:16:04 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> Message-ID: <3BD3C1C7-9294-4FE9-9BFB-7FEF81706CD4@evertype.com> > On 3 Apr 2017, at 17:18, Asmus Freytag wrote: > > On 4/3/2017 5:12 AM, Michael Everson wrote: >>> I'm not convinced that it is. A player starts with two non-interchangeable bishops. could only refer the white bishop that is restricted to black squares. That's a semantic difference. >>> >> Surely not. If it were, we would encode WHITE BISHOP THAT STAYS ON THE WHITE SQUARES and WHITE BISHOP THAT STAYS ON BLACK SQUARES and we would encode WHITE KNIGHT THAT MOVES FROM WHITE SQUARES TO BLACK SQUARES and WHITE KNIGHT THAT MOVES FROM BLACK SQUARES TO WHITE SQUARES. >> > The non-interchangeability of bishops is a fact about chess rules. We agree. :-) > It has no business being "encoded" on the character level. We agree. :-) Michael Everson From kent.karlsson14 at telia.com Mon Apr 3 16:24:59 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Mon, 03 Apr 2017 23:24:59 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-03 20:46, skrev "Kent Karlsson" : > > Den 2017-04-03 19:51, skrev "markus.icu at gmail.com" : > >> > It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for >> the >> > board layout (e.g., via a table), board frame style, and cell/field >> shading. >> > In each field, the existing characters should suffice. >> > >> > markus > > True, and one can easily find an example online. > > Slightly modified from > http://stackoverflow.com/questions/18505921/chess-using-tables > > [...] > A bit more modification: more colourful, even with /// striped backgrounds. One disadvantage is that the "white" pieces interior get the background colour rather than being actually white. To get them actually white (not just the interiors, but the entire pieces), use the "black"(!) pieces, and (via CSS) colour them white (need to be set on a non-white background to be visible...). I know, the latter trick will make parsing even more tricky (needing to interpret not only the HTML tag markup and chess characters, but also (say) HTML class attribute to distinguish "white" from "black" pieces). And, parsing (for other things than display in a browser), will be quite sensitive to the exact way of expressing this in HTML. There are many quite different ways of expressing this in HTML (+CSS). But... with a bit of JavaScript savvyness, you can program moving the pieces around... ;-) And substitute the chess characters to more emoji style images of chess pieces... Still in ;-) mode.
-------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Apr 3 16:33:53 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 22:33:53 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> Message-ID: <8A987C4A-CF9E-4EA0-A3C4-99B0DD8CEFDD@evertype.com> On 3 Apr 2017, at 18:51, Markus Scherer wrote: > > It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for the board layout (e.g., via a table), board frame style, and cell/field shading. In each field, the existing characters should suffice. That isn?t plain text. This is plain text: ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????? I can read this in my plain-text e-mail. I can copy it from the plain-text e-mail and past it into Quark XPress as in the proposal, or into Microsoft Word v. 15 for Mac as shown below (the first one is just as-is pasted into Word; the second formatted itself when I selected the Ludus font. None of these examples uses HTML. None uses some external folder with hard-to-format css rules. None needs to be constructed by some HTML or XML
matrix. It?s all just a font with normal OpenType features, and normal use of variation sequences. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ms-word-lg.png Type: image/png Size: 101582 bytes Desc: not available URL: From liancu at microsoft.com Mon Apr 3 16:37:10 2017 From: liancu at microsoft.com (Laurentiu Iancu) Date: Mon, 3 Apr 2017 21:37:10 +0000 Subject: Unicode 10.0 Legitimacy of 0031 FE0E 20E3 In-Reply-To: <20170403221259.40b3a1cb@JRWUBU2> References: <20170403221259.40b3a1cb@JRWUBU2> Message-ID: Richard, The emoji and text presentation sequences were moved to the UTS #51 data file emoji-variation-sequences.txt, which is new in Version 5.0 of the UTS. Please see http://www.unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt The move is documented on the Beta Unicode 10.0 page, http://www.unicode.org/versions/beta-10.0.0.html in the "Standardized Variation Sequences" section. Regards, L. -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham Sent: Monday, April 3, 2017 2:13 PM To: unicode at unicode.org Subject: Unicode 10.0 Legitimacy of 0031 FE0E 20E3 Where in the draft databases for Unicode 10.0 is Unicode 9.0 variation sequence declared legitimate? Without such a declaration, a font that had a special glyph for or a substitution specific to would not be Unicode compliant. I hope this reflects my ignorance of the definition system rather than an error in the databases. Richard. -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Mon Apr 3 16:44:28 2017 From: markus.icu at gmail.com (Markus Scherer) Date: Mon, 3 Apr 2017 14:44:28 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <8A987C4A-CF9E-4EA0-A3C4-99B0DD8CEFDD@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> <8A987C4A-CF9E-4EA0-A3C4-99B0DD8CEFDD@evertype.com> Message-ID: On Mon, Apr 3, 2017 at 2:33 PM, Michael Everson wrote: > On 3 Apr 2017, at 18:51, Markus Scherer wrote: > > > It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for > the board layout (e.g., via a table), board frame style, and cell/field > shading. In each field, the existing characters should suffice. > > > That isn?t plain text. > A lot of stuff needed for printing books and laying out PDFs and web pages goes beyond plain text. Whose requirement is it to represent an entire chess or checkers board in plain text? Other than a sort of puzzle of "what would it take to do so?" markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Apr 3 16:48:31 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 22:48:31 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403203355.6cbfc184@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> Message-ID: <9B849393-FEA0-43CE-A4E2-CA63DBAAF425@evertype.com> On 3 Apr 2017, at 20:33, Richard Wordingham wrote: > There was no intention to encode the bishops separately. It just happens that the rules of chess allow one to distinguish the bishops > simply by recording the colour of the square they are currently on. That only works for them though. > The basic text elements in the scheme other than boundary markers will be: > > empty white square > empty black square > white square with specific piece on it > black square with specific piece on it. Or rather, in terms of the font glyphs: light square dark square specific piece surrounded by light square specific piece surrounded by dark square > If the variation selectors are ignored, these simplify to: > > white square > hatched square > specific piece > > This preserves all the information; the pattern of squares is known in advance and therefore redundant. Yes, this is what I?ve proposed. Michael Everson From everson at evertype.com Mon Apr 3 16:52:55 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 22:52:55 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> Message-ID: On 3 Apr 2017, at 20:58, Asmus Freytag wrote: > > On 4/3/2017 12:33 PM, Richard Wordingham wrote: >> If the variation selectors are ignored, these simplify to: >> >> white square >> hatched square >> specific piece >> >> This preserves all the information; the pattern of squares is known in advance and therefore redundant. >> > This assumes that you always show the full board. True, and see the two short examples at the top of page 5 of the proposal, and see the 12?12 board in Figure 5. > Under that assumption, you are correct. > > The variation selectors would then not needed, even, in the text: style markup could supply them in all cases where the data isn't raw text. What style markup? Nothing is defined; nothing is portable. If we use VS and put the burden on the font, we actually do the same thing that traditional lead-type setters did. > They would essentially only live in the data stream to the rendering engine, to force glyph selection, but not need to be part of the text. > > Interesting, Not entirely sure I follow, but? well, I like the proposal best because it is robust, easy to learn, and easy to use. Michael Everson From asmusf at ix.netcom.com Mon Apr 3 17:07:38 2017 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Mon, 3 Apr 2017 15:07:38 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> Message-ID: On 4/3/2017 2:15 PM, Michael Everson wrote: > On 3 Apr 2017, at 17:16, Asmus Freytag wrote: > >>>> The same indirection is at play here. >>> This is pure rhetoric, Asmus. It addresses the problem in no way. >> Actually it does. I'm amazed that you don't see the connection. > I?ve never understood you when you back up into that particular kind of abstract rhetoric. Sometimes thinking through something in abstract terms actually clarifies the situation. > >>>> the oft-stated fact that variation selectors may be ignored. >>> I?m aware of this. I may be wrong, but I believe you advocated for the encoding of variation sequences for mathematics purposes. >> Yes, for those cases where the differences are known to not carry meaning, but where duplicating all fonts or duplicating the characters would have been the wrong solution to allow support for both conventions (e.g. upright vs. slanted integral signs, details of relational operator design, etc.). > The ?meaning? of a chess-problem matrix is the whole 8 ? 8 board, not the empty dark square at b4 or the white pawn on In other words, you assert that partial boards never need to be displayed. (Let's take that as read, then). > > The ?problem? the higher-level protocol is supposed to solve is the one where a chess piece of one colour sits in an em-squared zone whether light or dark. In lead type this was a glyph issue. Lead type had just exactly what my proposal has: A piece with in-line text metrics, spaced harmoniously with digits and letters, and square sorts with and without hatching. Leaving aside the abstract question whether modeling lead type is ipso facto the best solution in all cases... > > Standardized variation sequences are the best way to achieve this simply and without needless duplication. :-) > >>> Are you saying that the empty white and black squares should use VS but the chess pieces are not? That makes no sense to me at all. >> I'm saying that perhaps it would be appropriate to select M-square glyph variants via a variation selector. That seems a clear-cut glyph *variation* to me. (If this variation is ignored, then the text looks bad, but in a way that is similar to selecting the wrong font - which is a rule-of-thumb way of evaluating whether variation selectors are appropriate). > OK, then you support the part of the proposal that applies VS1 and VS2 to the chess pieces. My statement just was that a proposal where piece + VS should be M-square, piece w/o VS should be generic, might make some sense (and same for a suitable "empty" cell). The next question would be whether the alternation in background is best expressed in variation sequences or by some other means. If you never need to show just a single field, then I concede that the main drawback of variation selectors for the background style is absent; however, reading ahead in your message, the partial grid appears to be common, therefore the reason to choose an alternate solution to the background style is a strong one. > >> The distinction between white/black background might be of a different nature. If you have arranged everything in a grid with the correct matrix, then the color of the background is perhaps redundant, given that there is a uniform convention for it. > Yes but you still want it to be reasonably legible when the OpenType ligatures fail. I think that this: > > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????? > is far better than this: > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ??????????????????<< Is it the pawn or the queen that?s on the black square? > ?????????????????? > ?????????? > See? To parse this one you have to remember which of the white squares are the alternating black ones. I don?t consider that as legible as using both 25A1 and 25A8. > > The colour of the matrix is NOT redundant for a human reader. OK -- in that case you've actually made an argument for *duplicating *the codes for *ALL *the pieces (as well as the empties). That way, you are guaranteed that (if the font supports the glyphs) you get what you want. With variation selectors for background color, you do not get what you want for the pieces. Having the system use specific character codes for the empties and variation selectors for the pieces is a needless complication; just duplicate the few pieces with a hatched background. (The precise style of hatching should be left to the font - that's not something that you specify in plain text). Leave the question of requesting M-square metrics to a (single) variation selector and you are done. (the convention would be that 25A8 + VS results in an M-square glyph using some hatching that matches that of the hatched code points for chess pieces, not necessarily matching the hatching style that you get for 25A8 w/o the VS). (Alternatively, you could add a code for "dark cell" so that the hatching can be anything whether or not there's VS). Now, this model is much closer to the way VSs are used for math operators (but the reasoning may be a bit abstract, so I won't bother you with it here). > >> If you assume the characters will ever be used outside a full grid, then that assumption fails and it will not be possible to restore the intended meaning if the variation selectors are missing. That's a warning flag, that they may not be appropriate for that use. > You can?t assume that they wouldn?t be. All of my examples in ?2 of the proposal are in fact outside of a full grid. I think the proposal as it stands ticks the most boxes. (I have changed ?black square? and ?white square? to ?dark square? and ?light square? however. If the proposal duplicates the pieces that are on dark squares and does not use any VS sequences to select the color of the square (but only to select the M-square metrics) it would be more robust and less complex to implement. (A chess font would not need to do anything but provide the right glyphs and ignore the VS, because they would be in M-squar metrics anyway). A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Apr 3 17:28:05 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 3 Apr 2017 23:28:05 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <9B849393-FEA0-43CE-A4E2-CA63DBAAF425@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <9B849393-FEA0-43CE-A4E2-CA63DBAAF425@evertype.com> Message-ID: <20170403232805.6e90972a@JRWUBU2> On Mon, 3 Apr 2017 22:48:31 +0100 Michael Everson wrote: > Yes, this is what I?ve proposed. I was explaining it to Asmus and others with similar misunderstandings. Richard. From richard.wordingham at ntlworld.com Mon Apr 3 17:34:31 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 3 Apr 2017 23:34:31 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> Message-ID: <20170403233431.5e5a4858@JRWUBU2> On Mon, 3 Apr 2017 15:07:38 -0700 "Asmus Freytag (c)" wrote: > Having the system use specific character codes for the empties and > variation selectors for the pieces is a needless complication; just > duplicate the few pieces with a hatched background. (The precise > style of hatching should be left to the font - that's not something > that you specify in plain text). > Leave the question of requesting M-square metrics to a (single) > variation selector and you are done. This solution quiets my qualms. Richard. From everson at evertype.com Mon Apr 3 17:35:52 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 23:35:52 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403220348.3efb4d1a@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> Message-ID: <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> On 3 Apr 2017, at 22:03, Richard Wordingham wrote: > Nobody said the glyphs for use in ordinary text had to be a fixed width. That?s why there?s a non-variant state and then two ?on-square? variant states. If you want to construct a chessboard using a font, whether it is using an ASCII font or using Unicode characters with variation selectors, the glyphs in that context have to be fixed width (if you want, you know, a square chess board). > What I am saying is that the glyphs for the two new variants you are proposing need to harmonise with the block elements such as U+2581 > LOWER ONE EIGHTH BLOCK. No? in a chess font the font designer has to draw those block-element characters differently, to harmonize with the > That requires uniform width *for those variants*. That is a key part of the glyph family's essence. In their original usage in graphic terminals, sure. And some people still emulate those, and when they use those characters they draw them for that purpose. In current ASCII-based chess-fonts, a set of characters is used to draw a line (of one kind or another) around the board, and when I looked for Unicode characters to map to these, the block elements were the ones that had the right structure, since they were high and low and left and right in the em square. > There is no such requirement on the glyphs for normal text use as at present. There is **in a chess font** if you want to be able to draw a box around the chessboard. All the ASCII-based chess fonts have glyphs for this. In the Danish Skak font (see Figure 3) the eight ASCII characters 9, _, ), |, \, 0, -, and = are used. In my proposal, I use eight Block Element characters. It works, and is flexible enough even to cater to ornate frames. > If one had a row of squares in flowing text, one would want the row to act like a word. One might have to resort to gluing it together using CGJ or WJ. What are you on about? I?m talking about making 8?8 tables, not flowing rows of chessboards within a paragraph. I mean, sure, if you wanted to do that, you?d run into line-breaking weirdness, but nobody would do that, and so that weird situation just doesn?t matter. All I was saying is that SPACE and NBSP aren?t the right characters to use for the light squares on a game board. >>> Also having variants of U+25A1 and U+25A8 that match the game square filter modifiers seems quite legitimate. >> >> Um, wait? What are you proposing NBSP for? I'm confused now. If you like these two characters (and I am glad you do) there?s no need for >> U+00A0 at all. > > To be pedantic, I said that the proposed variants were legitimate, not that I liked them. Um, ok. I don?t see that?s helpful in terms of improving or modifying the proposal. I stand by my proposal, which I have implemented successfully, even quickly as with William?s Quest font. > I'm talking about looking for a U+2654 glyph for ordinary text when all the first font tried has is: > > 2654 FE01; Chesspiece on white; # WHITE CHESS KING > 2654 FE02; Chesspiece on black; # WHITE CHESS KING > > I must confess I am now wondering what the format 4 cmap should say about U+2654. I really don?t know about the ?format 4 cmap? text. I copied it from a successful VS proposal by Ken Lunde of Adobe. What I used was the liga and rlig tables. I didn?t edit any cmap table per se and don?t know how to do it. Without any VS character, 2654 just renders like an ordinary white king as drawn in the font. It only goes to a light or dark board-square glyph with the VS. > Should it give a glyph for U+2654 or not? Of course. Why wouldn't it? It?s a graphic character. > I'm also wondering about Windows behaviour. There was a time when Windows 7 only supported variation sequences if they appeared in the cmap 14 subtable. I don?t know. Older software often doesn?t support this. Quark XPress, which has become a completely awesome typesetting program, used to be terrible at it. Maybe people typesetting chessboards would have to use something other than some apps on Windows 7, or maybe something other than Windows 7 entirely. I can?t use Unicode at all really on Mac OS 9, which I use rom time to time. >>> If it's looking for an OpenType font for a glyph of the family , >> >> Or any OpenType substitution string. > > Most won't be recognised as needed. If the first font lacks a ligature for , fallback won't be used for it. Grapheme clusters and > variation sequences get special treatment. I don?t see how anything you?re saying either identifies or tried to solve any actual problem with the proposal. The proposal says ?put some substitution tables into your chess font to display a particular glyph? and some apps do that and some don?t. You can?t use VS with apps that don't. Michael Everson From everson at evertype.com Mon Apr 3 17:46:57 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 3 Apr 2017 23:46:57 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <88b9a791-4d5c-534f-160a-ec0f521135d1@ix.netcom.com> <8A987C4A-CF9E-4EA0-A3C4-99B0DD8CEFDD@evertype.com> Message-ID: Markus, that there exist dozens of fonts designed for chessboard typesetting should suggest that people wish to use computers to do so. There are many, many volumes published on chess problems and there are some people who are passionately interested in that very specific intellectual pursuit. I really can?t see any reason to second-guess or oppose a desire to have simple and standardized way of representing that kind of data in legible plain text whose legibility can be optimized via a standardized font mechanism. Michael Everson > On 3 Apr 2017, at 22:44, Markus Scherer wrote: > > On Mon, Apr 3, 2017 at 2:33 PM, Michael Everson wrote: > On 3 Apr 2017, at 18:51, Markus Scherer wrote: >> >> It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for the board layout (e.g., via a table), board frame style, and cell/field shading. In each field, the existing characters should suffice. > > That isn?t plain text. > > A lot of stuff needed for printing books and laying out PDFs and web pages goes beyond plain text. > > Whose requirement is it to represent an entire chess or checkers board in plain text? > > Other than a sort of puzzle of "what would it take to do so?" > > markus From asmusf at ix.netcom.com Mon Apr 3 17:53:42 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Apr 2017 15:53:42 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403232805.6e90972a@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <9B849393-FEA0-43CE-A4E2-CA63DBAAF425@evertype.com> <20170403232805.6e90972a@JRWUBU2> Message-ID: <76a2f9a0-be7a-47dd-5144-c2309a13a20d@ix.netcom.com> An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Apr 3 18:30:30 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 00:30:30 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> Message-ID: <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> On 3 Apr 2017, at 23:07, Asmus Freytag (c) wrote: > > On 4/3/2017 2:15 PM, Michael Everson wrote: >> On 3 Apr 2017, at 17:16, Asmus Freytag wrote: >> >>>>> The same indirection is at play here. >>>>> >>>> This is pure rhetoric, Asmus. It addresses the problem in no way. >>>> >>> Actually it does. I'm amazed that you don't see the connection. >>> >> I?ve never understood you when you back up into that particular kind of abstract rhetoric. > > Sometimes thinking through something in abstract terms actually clarifies the situation. Of course I know that?s your view. It?s just never been an effective communication strategy between you and me generally. >>> The ?meaning? of a chess-problem matrix is the whole 8 ? 8 board, not the empty dark square at b4 or the white pawn on > > In other words, you assert that partial boards never need to be displayed. (Let's take that as read, then). No, I am sure that a variety of board shapes can be set in plain text with these conventions, though the principle concern is classical chess notation. >> The ?problem? the higher-level protocol is supposed to solve is the one where a chess piece of one colour sits in an em-squared zone whether light or dark. In lead type this was a glyph issue. Lead type had just exactly what my proposal has: A piece with in-line text metrics, spaced harmoniously with digits and letters, and square sorts with and without hatching. > > Leaving aside the abstract question whether modeling lead type is ipso facto the best solution in all cases? I think it was a good expedient solution in lead type and that this proposal offers a robust parseable digital version of that solution, and I assert people will make use of that data structure. >> OK, then you support the part of the proposal that applies VS1 and VS2 to the chess pieces. > > My statement just was that a proposal where piece + VS should be M-square, piece w/o VS should be generic, might make some sense (and same for a suitable "empty" cell). > > The next question would be whether the alternation in background is best expressed in variation sequences or by some other means. I think the value in the data structures I have described is best retained as text. Anything else just seems it would be simply needlessly complex, > If you never need to show just a single field, then I concede that the main drawback of variation selectors for the background style is absent; however, reading ahead in your message, the partial grid appears to be common, therefore the reason to choose an alternate solution to the background style is a strong one. Well, it?s text, Asmus, so you can delete all but one line of a board if you want: ?????????????????? There. So? what are you talking about? It?s a text matrix. It?s like a kind of poem. ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????? It even looks like one. That?s a meaningful pattern. A kind of writing system. >> The colour of the matrix is NOT redundant for a human reader. >> > OK -- in that case you've actually made an argument for duplicating the codes for ALL the pieces (as well as the empties). Why? It?s text. It?s spelling. These structures are read. There?s no reason to encode two letter C?s because one is pronounced [k] and one [s]. > That way, you are guaranteed that (if the font supports the glyphs) you get what you want. Then you?d have to have three, because there are three kinds of things that need to be in a single font: by itself, on a light square, and on a dark square. > With variation selectors for background color, you do not get what you want for the pieces. I implemented it! It works! > Having the system use specific character codes for the empties and variation selectors for the pieces is a needless complication; just duplicate the few pieces with a hatched background. (The precise style of hatching should be left to the font - that's not something that you specify in plain text). Your idea really isn?t better. > Leave the question of requesting M-square metrics to a (single) variation selector and you are done. (the convention would be that 25A8 + VS results in an M-square glyph using some hatching that matches that of the hatched code points for chess pieces, not necessarily matching the hatching style that you get for 25A8 w/o the VS). (Alternatively, you could add a code for "dark cell" so that the hatching can be anything whether or not there's VS). You want WHITE CHESS KNIGHT, and WHITE CHESS KNIGHT ON SQUARE, and use a VS that changes the colour of the square? That is less legible in plain text than my proposal. Not as good. Detrimental to the user indeed. > Now, this model is much closer to the way VSs are used for math operators (but the reasoning may be a bit abstract, so I won't bother you with it here). I don?t agree that your model is better than mine. Interesting, but not better. > If the proposal duplicates the pieces that are on dark squares and does not use any VS sequences to select the color of the square (but only to select the M-square metrics) it would be more robust and less complex to implement. (A chess font would not need to do anything but provide the right glyphs and ignore the VS, because they would be in M-squar metrics anyway). Then you?re still stuck for a solution for non-em-square characters for inline text. Michael Everson From everson at evertype.com Mon Apr 3 18:31:26 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 00:31:26 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403233431.5e5a4858@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <20170403233431.5e5a4858@JRWUBU2> Message-ID: <87747D84-2238-47DF-AFF1-D181FC726CFA@evertype.com> On 3 Apr 2017, at 23:34, Richard Wordingham wrote: > >> Leave the question of requesting M-square metrics to a (single) variation selector and you are done. > > This solution quiets my qualms. It does not meet my requirement, and it solves no problem. Michael Everson From kent.karlsson14 at telia.com Mon Apr 3 18:45:01 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Tue, 04 Apr 2017 01:45:01 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: I can well imagine people deeply interested in chess, to want to exchange chess board layouts in plain text emails (or at least not use quite hard-to-handle HTML code), and even parse them (programmatically) for analysis by a program, not wanting to bother with quite complex HTML/CSS stuff. Including making input easy (keyboard, palette), just "typing" the chess board layout (with pieces). But for HTML pages on chess, HTML/CSS markup is certainly preferable; but it shouldn't be impossible to just paste in a "plain text" chess board to an HTML page (with minimal formatting effort). One can (fairly easily) make a program to convert the "plain text" chess board to an HTML one. Book formatting? Old style book formatting still cannot use as sophisticated layouts as HTML can... (AFAIK). /Kent K Den 2017-04-03 23:44, skrev "markus.icu at gmail.com" : > On Mon, Apr 3, 2017 at 2:33 PM, Michael Everson wrote: >> On 3 Apr 2017, at 18:51, Markus Scherer wrote: >>> >>> It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for >>> the board layout (e.g., via a table), board frame style, and?cell/field >>> shading.?In each field, the existing characters should suffice. >> >> That isn?t plain text. > > A lot of stuff needed for printing books and laying out PDFs and web pages > goes beyond plain text. > > Whose requirement is it to represent an entire chess or checkers board in > plain text? > > Other than a sort of puzzle of "what would it take to do so?" > > markus > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Apr 3 18:47:01 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 4 Apr 2017 00:47:01 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> Message-ID: <20170404004701.19ad750c@JRWUBU2> On Mon, 3 Apr 2017 23:35:52 +0100 Michael Everson wrote: > On 3 Apr 2017, at 22:03, Richard Wordingham > wrote: The relevant text before was, "I'm talking about looking for a U+2654 glyph for ordinary text when all the first font tried has is: 2654 FE01; Chesspiece on white; # WHITE CHESS KING 2654 FE02; Chesspiece on black; # WHITE CHESS KING" > > Should it give a glyph for U+2654 or not? > Of course. Why wouldn't it? It?s a graphic character. What my conceptual example font has is not the sort of glyph one would want for sentences like "Alice ? d4 meets White Queen ? (with shawl)". > I don?t see how anything you?re saying either identifies or tried to > solve any actual problem with the proposal. The proposal says ?put > some substitution tables into your chess font to display a particular > glyph? and some apps do that and some don?t. You can?t use VS with > apps that don't. I'm trying to work out whether we need a variation sequence for "chesspiece in a sentence". We need the advice of someone who's worked on font fallback. You don't need substitution tables to be executed if your application can just look up glyphs for variation sequences. Richard. From richard.wordingham at ntlworld.com Mon Apr 3 18:59:42 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 4 Apr 2017 00:59:42 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: <20170404005942.4aa42021@JRWUBU2> On Tue, 4 Apr 2017 00:30:30 +0100 Michael Everson wrote: > On 3 Apr 2017, at 23:07, Asmus Freytag (c) > wrote: > You want WHITE CHESS KNIGHT, and WHITE CHESS KNIGHT ON SQUARE, and > use a VS that changes the colour of the square? That is less legible > in plain text than my proposal. Not as good. Detrimental to the user > indeed. No, he wants two characters WHITE CHESS KNIGHT and WHITE CHESS KNIGHT ON DARK BACKGROUND, and a variation selector, say VS2, that when applied to them yields a glyph that works with block elements. It might be simpler if WHITE CHESS KNIGHT ON DARK BACKGROUND was defined as a character that worked with block elements. > Then you?re still stuck for a solution for non-em-square characters > for inline text. No, WHITE CHESS KNIGHT should continue to fulfil that role. My only worry is that one might need a variation selector, say VS1, to force the choice of a suitable glyph. Richard. From everson at evertype.com Mon Apr 3 19:06:16 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 01:06:16 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <76a2f9a0-be7a-47dd-5144-c2309a13a20d@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <9B849393-FEA0-43CE-A4E2-CA63DBAAF425@evertype.com> <20170403232805.6e90972a@JRWUBU2> <76a2f9a0-be7a-47dd-5144-c2309a13a20d@ix.netcom.com> Message-ID: <7E773027-B43E-458D-B234-7D8F0B6BE34E@evertype.com> On 3 Apr 2017, at 23:53, Asmus Freytag wrote: > Alternatively, a system that uses no Variation selectors and only relies on Opentype ligatures might work even better. > > This would require one Empty and one Filled board cell, to ligate with whatever piece is supposed to sit on top of it. The use of Empty / Filled board cell would result in the correct metrics and by encoding an empty cell that is different from a "white square", there's no need to overload the use of the latter. That would not be more legible ? quite the opposite, in fact ? because it would add variable length to lines for every character in the 8 ? 8 matrix in any environment where the ligation failed. That?s not chess-legible. Michael Everson From everson at evertype.com Mon Apr 3 19:10:15 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 01:10:15 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: On 4 Apr 2017, at 00:45, Kent Karlsson wrote: > > Book formatting? Old style book formatting still cannot use as sophisticated layouts as HTML can... (AFAIK). Yeah, but come on, the chief use of chess characters is to cite them inline in text like any other symbol @ ? % & and the other equally chief use of chess characters is to set 8 ? 8 chessboards which float in space in the layout as figures. The layout requirement isn?t all that demanding that HTML offers a major advantage. Michael Everson From everson at evertype.com Mon Apr 3 19:15:19 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 01:15:19 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170404004701.19ad750c@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> Message-ID: <498563FC-8DB3-4E6B-80C9-71F060D5266C@evertype.com> On 4 Apr 2017, at 00:47, Richard Wordingham wrote: > I'm trying to work out whether we need a variation sequence for > "chesspiece in a sentence?. Of course! Haven?t you ever seen chess problem texts? Check out the Fairy Chess proposal for encoding additional characters. Plenty of examples there. > We need the advice of someone who's worked on font fallback. > > You don't need substitution tables to be executed if your application can just look up glyphs for variation sequences. That?s the same thing. sub characterA characterB by glyphC; It works reliably in a number of environments, though I think some screenshots I sent are in mails which have not got through. Michael Everson From everson at evertype.com Mon Apr 3 19:22:00 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 01:22:00 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170404005942.4aa42021@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <20170404005942.4aa42021@JRWUBU2> Message-ID: <40133EC9-7DF7-40D5-B243-02DB62693B7F@evertype.com> On 4 Apr 2017, at 00:59, Richard Wordingham wrote: > No, he wants two characters WHITE CHESS KNIGHT and WHITE CHESS KNIGHT ON DARK BACKGROUND, and a variation selector, say VS2, that when applied to them yields a glyph that works with block elements. > > It might be simpler if WHITE CHESS KNIGHT ON DARK BACKGROUND was defined as a character that worked with block elements. I can?t fathom how you would configure a font to do whatever it is you think you?re describing here. I don?t follow it. ?worked with which block elements, to do what? If it?s draw a box around the board, I already said, the answer is to change the graphics terminal block elements because in a chess-font environment their positional function is used, not their graphics terminal glyph. >> Then you?re still stuck for a solution for non-em-square characters for inline text. > > No, WHITE CHESS KNIGHT should continue to fulfil that role. My only worry is that one might need a variation selector, say VS1, to force the choice of a suitable glyph. I don?t get what you?re on about. I?ve already solved this problem, and whatever it is you?re describing sure doesn?t sound intuitive. I?ve shown my implementations which do what I need them to do. I don?t know if you can do the same, but go ahead and make your font to prove it, and write it up clearly in a counter-proposal if you think it?s the right way to . Michael Everson From everson at evertype.com Mon Apr 3 19:30:05 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 01:30:05 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170404004701.19ad750c@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> Message-ID: > I'm trying to work out whether we need a variation sequence for > "chesspiece in a sentence?. Of course! Haven?t you ever seen chess problem texts? Check out the Fairy Chess proposal for encoding additional characters. Plenty of examples there. Sorry, I meant ?Of course **not**!? that is, chesspiece in a sentence is extremely common, and should be the default (not stylized) form. We can?t repurpose that to be ?chesspiece on a white square? because it hasn?t been previously and changing that would affect the layout of existing data. Michael Everson -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Mon Apr 3 20:01:33 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Tue, 04 Apr 2017 03:01:33 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-04 02:10, skrev "Michael Everson" : > On 4 Apr 2017, at 00:45, Kent Karlsson wrote: >> >> Book formatting? Old style book formatting still cannot use as sophisticated >> layouts as HTML can... (AFAIK). > > Yeah, but come on, the chief use of chess characters is to cite them inline in > text like any other symbol @ ? % & and the other equally chief use of chess > characters is to set 8 ? 8 chessboards which float in space in the layout as > figures. The layout requirement isn?t all that demanding that HTML offers a > major advantage. In case you missed it, the statement I made above was in *SUPPORT* of your proposal (in general, but not necessarily all details)... /Kent K > Michael Everson From everson at evertype.com Mon Apr 3 20:12:16 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 02:12:16 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: > On 4 Apr 2017, at 02:01, Kent Karlsson wrote: > >>> Book formatting? Old style book formatting still cannot use as sophisticated layouts as HTML can... (AFAIK). >> >> Yeah, but come on, the chief use of chess characters is to cite them inline in text like any other symbol @ ? % & and the other equally chief use of chess characters is to set 8 ? 8 chessboards which float in space in the layout as figures. The layout requirement isn?t all that demanding that HTML offers a major advantage. > > In case you missed it, the statement I made above was in *SUPPORT* of your proposal (in general, but not necessarily all details)? It?s not easy to tell because couterapproaches suggested are not well specified and really don?t seem to be practical. It *is* important that there be an even number of characters in every row of 8 squares for fallback display to be better rather than worse, I think. I don?t think it?s possible to ensure that the rendering engine every app displays the fallback identically (Seems that Word and LibreOffice and Pages and Quark display a little differently; this seems to be that they load glyphs from some fonts before glyphs from others. I found while setting the tables that it was convenient to have to remember that every one of the 64 characters had to have VS1 or VS2 along with it. Constructing a table from scratch and modifying and existing one both felt easier with uniform encoding. Michael Everson From asmusf at ix.netcom.com Mon Apr 3 20:21:38 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Apr 2017 18:21:38 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Mon Apr 3 20:51:53 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Tue, 04 Apr 2017 03:51:53 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-04 03:21, skrev "Asmus Freytag" : > would look like this, if you base your proposal on ligatures rather than > variation selectors (minimal case A above): > > ?????????????????????? That line has a lot of VSs in it... (I see them, since they happen to be visible in the email app I use.) > The disadvantage is that the fallback rendering does not line up; but I would > regard that as a minor issue. I think Michael regards that non-lineup as a show-stopper. /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Mon Apr 3 20:51:57 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Tue, 04 Apr 2017 03:51:57 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-04 03:12, skrev "Michael Everson" : > It *is* important that there be an even number of characters in every row of 8 > squares for fallback display to be better rather than worse, I think. I agree. (Though *at present*, I happen to get a visible display of the VSs in the email app, which does not look too good.) > I found while setting the tables that it was convenient to have to remember > that every one of the 64 characters had to have VS1 or VS2 along with it. > Constructing a table from scratch and modifying and existing one both felt > easier with uniform encoding. Yes. BUT, I would hope that chess enthusiasts would not have to think much about the encoding. Either using a special keyboard layout (momentarily) or using a palette for picking board item by board item seems to be better options. I'm sure someone will make a browser based chess editor, complete with suitable palette, and having an empty board pre-edited to start out (replacing the empty squares as pieces are laid out or moved). /Kent K From kent.karlsson14 at telia.com Mon Apr 3 21:00:45 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Tue, 04 Apr 2017 04:00:45 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> Message-ID: Den 2017-04-04 00:35, skrev "Michael Everson" : >> What I am saying is that the glyphs for the two new variants you are >> proposing need to harmonise with the block elements such as U+2581 >> LOWER ONE EIGHTH BLOCK. > > No? in a chess font the font designer has to draw those block-element > characters differently, to harmonize with the > >> That requires uniform width *for those variants*. That is a key part of the >> glyph family's essence. > > In their original usage in graphic terminals, sure. And some people still > emulate those, and when they use those characters they draw them for that > purpose. In current ASCII-based chess-fonts, a set of characters is used to > draw a line (of one kind or another) around the board, and when I looked for > Unicode characters to map to these, the block elements were the ones that had > the right structure, since they were high and low and left and right in the em > square. > >> There is no such requirement on the glyphs for normal text use as at present. > > There is **in a chess font** if you want to be able to draw a box around the > chessboard. I'm not too happy about this. Maybe have VSs applied also to the chess box drawing chars? /Kent K From gerrietm at icloud.com Mon Apr 3 21:39:57 2017 From: gerrietm at icloud.com (Gerriet M. Denkmann) Date: Tue, 4 Apr 2017 09:39:57 +0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: References: Message-ID: > On Mon, 3 Apr 2017 14:12:51 +0700 > "Gerriet M. Denkmann" wrote: > >> The Combining Class is used for normalisation of strings. >> Normalisation of strings is important for filenames in filesystems. >> >> As far as I know, a Thai consonant (Lo, Other_Letter) can have >> several Nonspacing_Marks. This cluster of nonspacing marks can >> contain at most one top/bottom vowel and at most one tone/other mark. >> There is no syntactically meaning in the order of these nonspacing >> marks. > > You're confusing the modern Thai language with the Thai script. It > seems that the Lao-style usage of NIKHAHIT as a vowel is known from > older Thai writing, and when used this way it could of course take a > tone mark. It also seems that the pressure to have both MAITAIKHU and > a tone mark on a consonant has been accepted for at least one minority > language. I stand corrected. I do know nothing about other languages written with Thai characters. So the rule should be: A consonant may have zero or one tone/other marks and also zero or one top/bottom vowels. Exceptions: NIKHAHIT + tone mark (no top/bottom vowel) MAITAIKHU + tone mark (no top/bottom vowel) The order of these has no semantical meaning. All top/bottom vowels should have Combining Class 103, other marks should have Combining Class x (with 103 < x < 107), tone marks should have Combining Class 107. Is anybody working on or is responsible for these things? Kind regards, Gerriet. From richard.wordingham at ntlworld.com Tue Apr 4 02:55:24 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 4 Apr 2017 08:55:24 +0100 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: References: Message-ID: <20170404085524.7a9bcfeb@JRWUBU2> On Tue, 4 Apr 2017 09:39:57 +0700 "Gerriet M. Denkmann" wrote: > So the rule should be: > > A consonant may have zero or one tone/other marks and also zero or > one top/bottom vowels. Exceptions: > NIKHAHIT + tone mark (no top/bottom vowel) > MAITAIKHU + tone mark (no top/bottom vowel) This list is not exhaustive. The order of MAITAIKHU and tone mark is significant - it should affect rendering. Formally, the Unicode Standard makes the point that the order of vowel above and tone mark is significant. > The order of these has no semantical meaning. This is true for the combination of a mark above and a mark below. For marks below, contrasting orders may be prevented (to a first approximation) by the chaos of the canonical combining classes. > All top/bottom vowels should have Combining Class 103, > other marks should have Combining Class x (with 103 < x < 107), > tone marks should have Combining Class 107. > > Is anybody working on or is responsible for these things? Unicode combining classes cannot be changed. All that can be done is to enforce the order of characters in normalised text. Asmus Freytag has been working on an extreme version of that that disallows minority languages in certain parts of domain names, and there is some pressure to start using dotted circles in rendering so as to punish transgressors, counterbalanced by the feeling that one shouldn't be suppressing minority languages. Marshall Phibun's jackboots are getting some exercise. There is some input checking, loosely based on WTT (Wing Thuk Thi). This may be implemented in such a way as to support the prohibition of minority languages. Richard. From duerst at it.aoyama.ac.jp Tue Apr 4 03:28:19 2017 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Tue, 4 Apr 2017 17:28:19 +0900 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: On 2017/04/03 23:41, Kent Karlsson wrote: > Hence the chess board lines should be displayed in a strong left-to-right > context (either via bidi markup characters, or via some higher order > bidi markup mechanism, such as the "bidi" attribute in HTML). Though in > most cases (not Arabic/Hebrew/... document), the bidi context will default > to left-to right... There never was a "bidi" attribute in HTML. You probably mean the "dir" attribute. Regards, Martin. From verdy_p at wanadoo.fr Tue Apr 4 08:00:07 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 4 Apr 2017 15:00:07 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: 2017-04-04 1:30 GMT+02:00 Michael Everson : > On 3 Apr 2017, at 23:07, Asmus Freytag (c) wrote: > > > > On 4/3/2017 2:15 PM, Michael Everson wrote: > >> On 3 Apr 2017, at 17:16, Asmus Freytag wrote: > >> > >>>>> The same indirection is at play here. > >>>>> > >>>> This is pure rhetoric, Asmus. It addresses the problem in no way. > >>>> > >>> Actually it does. I'm amazed that you don't see the connection. > >>> > >> I?ve never understood you when you back up into that particular kind of > abstract rhetoric. > > > > Sometimes thinking through something in abstract terms actually > clarifies the situation. > > Of course I know that?s your view. It?s just never been an effective > communication strategy between you and me generally. > > >>> The ?meaning? of a chess-problem matrix is the whole 8 ? 8 board, not > the empty dark square at b4 or the white pawn on > > > > In other words, you assert that partial boards never need to be > displayed. (Let's take that as read, then). > > No, I am sure that a variety of board shapes can be set in plain text with > these conventions, though the principle concern is classical chess notation. > > >> The ?problem? the higher-level protocol is supposed to solve is the one > where a chess piece of one colour sits in an em-squared zone whether light > or dark. In lead type this was a glyph issue. Lead type had just exactly > what my proposal has: A piece with in-line text metrics, spaced > harmoniously with digits and letters, and square sorts with and without > hatching. > > > > Leaving aside the abstract question whether modeling lead type is ipso > facto the best solution in all cases? > > I think it was a good expedient solution in lead type and that this > proposal offers a robust parseable digital version of that solution, and I > assert people will make use of that data structure. > > >> OK, then you support the part of the proposal that applies VS1 and VS2 > to the chess pieces. > > > > My statement just was that a proposal where piece + VS should be > M-square, piece w/o VS should be generic, might make some sense (and same > for a suitable "empty" cell). > > > > The next question would be whether the alternation in background is best > expressed in variation sequences or by some other means. > > I think the value in the data structures I have described is best retained > as text. Anything else just seems it would be simply needlessly complex, > > > If you never need to show just a single field, then I concede that the > main drawback of variation selectors for the background style is absent; > however, reading ahead in your message, the partial grid appears to be > common, therefore the reason to choose an alternate solution to the > background style is a strong one. > > Well, it?s text, Asmus, so you can delete all but one line of a board if > you want: > > ?????????????????? > > There. So? what are you talking about? It?s a text matrix. It?s like a > kind of poem. > > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????? > > It even looks like one. That?s a meaningful pattern. A kind of writing > system. > For me it looks like ASCII art, a hack mixing various characters intended for different uses and ignoring all semantics, only working because it reuses similar-looking glyphs instead of being an actual encoding. That represetnation is absultely not semantically coherent. If we want to have true checkboard cells, we need characters specifically for them, and in them we'll place (or not) chess pieces or any other suitable symbol or letter. This means creating clusters (cell+ZWJ+piece). This will be coherent. If we want to have borders for boards, we need coherent characters for them (we do not expct them to be combined with pieces, just that they will properly glue with cells in the middle of the board, and that their metric match them in suitable fonts). The fact that legacy renderers or fonts won't display that correctly is definitely not an argument. Many scripts still have problems being represented with legacy renderers or fonts. But the encoding is made to be coherent semantically. Fonts and rederers will adapt their properties to render what is semantically wanted and that will be also pleasing to read, and they still will be able to use various variants (e.g. emoji styles for pieces, possibly with 3D effects and colors, possibly animated pieces, or alternate decorative patterns in board cells, possibly photographic-based, such as wood, marble, grass, sand, glass, iron...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Tue Apr 4 08:11:17 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Tue, 4 Apr 2017 15:11:17 +0200 Subject: Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)) In-Reply-To: <12034663.23788.1491301116912.JavaMail.defaultUser@defaultHost> References: <11364706.56745.1491240615392.JavaMail.root@webmail12.bt.ext.cpcloud.co.uk> <7794797.61788.1491243181509.JavaMail.defaultUser@defaultHost> <12034663.23788.1491301116912.JavaMail.defaultUser@defaultHost> Message-ID: 2017-04-04 12:18 GMT+02:00 William_J_G Overington : > > ... developers prefer investing time in SVG renderers or existing font > technologies for OpenType (SVG fonts will come later when it will be > capable of doing the same things as OpenType, for now it does not cover all > the existing needs). > > Well, I do not know what developers prefer. There seems to be a need to > send custom emoji in interoperable Unicode plain text and I have put > forward an idea for how to do it. > You just know what you isolately prefer: can't you see that what you propose is even less powerfull than a **STANDARD** SVG path ? it already has eveything you 'propose", except that it is already widely implemented and developers will prefer reuing them directly. A SVG path looks like "M100,100h800v800h-800z" to draw a square 800-sized centered in a 1000-sized square, there's no need for "x" or "y", there are shortcuts already defined for horizontal or vertical strokes (using relative or absolute coordinates) and path closure, and it supports straight segments, cubic and quadratic splines and elliptic arcs. Its internal "machine" is very well documented (with extensive conformance tests for renderers, including for all supported geometric transforms and conversion of paths for creating stroke styles instead of filling them directly). -------------- next part -------------- An HTML attachment was scrubbed... URL: From otto.stolz at uni-konstanz.de Tue Apr 4 08:21:02 2017 From: otto.stolz at uni-konstanz.de (Otto Stolz) Date: Tue, 4 Apr 2017 15:21:02 +0200 Subject: Encoding of old compatibility characters In-Reply-To: <83fuht6fqg.fsf@gnu.org> References: <92ba6970-86e1-5d80-e3c9-239283a384b0@gmail.com> <41b2170a-6efb-518d-8c02-3881fbb09bae@kli.org> <2ba990ce-9d57-4e8b-b4dd-e9f1a821cd3b@gmail.com> <4q7f39oed2.fsf@chem.ox.ac.uk> <2d2b2a87-f4d8-7f28-59de-f6cf7437c9c5@ix.netcom.com> <7e7af7d6-dfc4-159a-832f-e60f24136b0f@gmail.com> <83fuht6fqg.fsf@gnu.org> Message-ID: Am 31.03.2017 um 09:57 schrieb Eli Zaretskii: > Arial Unicode MS supports that character [U+23E8], FWIW. Not on my good ole Wndows XP SP3 system. Best wishes, Otto From eliz at gnu.org Tue Apr 4 09:58:33 2017 From: eliz at gnu.org (Eli Zaretskii) Date: Tue, 04 Apr 2017 17:58:33 +0300 Subject: Encoding of old compatibility characters In-Reply-To: (message from Otto Stolz on Tue, 4 Apr 2017 15:21:02 +0200) References: <92ba6970-86e1-5d80-e3c9-239283a384b0@gmail.com> <41b2170a-6efb-518d-8c02-3881fbb09bae@kli.org> <2ba990ce-9d57-4e8b-b4dd-e9f1a821cd3b@gmail.com> <4q7f39oed2.fsf@chem.ox.ac.uk> <2d2b2a87-f4d8-7f28-59de-f6cf7437c9c5@ix.netcom.com> <7e7af7d6-dfc4-159a-832f-e60f24136b0f@gmail.com> <83fuht6fqg.fsf@gnu.org> Message-ID: <838tngp6cm.fsf@gnu.org> > From: Otto Stolz > Date: Tue, 4 Apr 2017 15:21:02 +0200 > > Am 31.03.2017 um 09:57 schrieb Eli Zaretskii: > > Arial Unicode MS supports that character [U+23E8], FWIW. > > Not on my good ole Wndows XP SP3 system. This here is also XP SP3. Maybe some package I have installed updated the font? From wjgo_10009 at btinternet.com Tue Apr 4 05:18:36 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Tue, 4 Apr 2017 11:18:36 +0100 (BST) Subject: Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)) In-Reply-To: References: <11364706.56745.1491240615392.JavaMail.root@webmail12.bt.ext.cpcloud.co.uk> <7794797.61788.1491243181509.JavaMail.defaultUser@defaultHost> Message-ID: <12034663.23788.1491301116912.JavaMail.defaultUser@defaultHost> Philippe Verdy wrote: > What you are describing is reinventing the wheel, notably basically what SVG paths already define. Well, I am trying to express, within a tag sequence that could be included in an interoperable Unicode plain text message, the glyph information for one emoji glyph of an OpenType colour font. I have not included anything about SVG. > Font encoding technologies define their own system using multiple tables and a compact dictionnary of tables with binary encoding, not suitable for inclusion in plain-text. Yes, that is why I have devised this format, so that the glyph information for one emoji glyph of an OpenType colour font could be included in a Unicode plain text message. > Note also that Emojis could be animated when rendered on screen (that's what we already see in many implementations using GIF icons for their emojis, even if they are not easily resizable). Animated SVG for now is still in beta but starts being used on some sites and rendered by web browsers. SVG images may also be scripted and may include accessbility feature (e.g. with sound played or hint bubbles displayed when hovering them). The format that I suggested could be extended if desired. For example, h is for an unanimated glyph. There could be added q and e if desired, so that instead of h one uses q for completing the glyph for each frame, and then e to export the complete animated glyph. For example, as follows. q means {define a complete glyph of advance width w from the glyph or glyphs in the glyphs buffer and place it in the animation buffer; reset everything except the animation buffer ready to define the next glyph in the animation;} e means {produce an animated glyph from the contents of the animation buffer ready for access by the main program; halt;} Yes, accessibility features are important and I will try to think about including them. Readers are welcome to make suggestions as to what is needed. > You only cover a part of what is needed .... Well, yes, I suppose so, yet what I have published could get something started and anything else that is needed could be added, either by me or by the Unicode Technical Committee and the Emoji Subcommittee if people are interested in implementing the idea. > .... but hope that someone will invest time to implet it in a renderer: Well, yes eventually. I am hoping that the idea will be discussed in the mailing list and then go forward to the Emoji Subcommittee and then go to the Unicode Technical Committee and then become part of The Unicode Standard and then be used by people. Many people think of new encoding ideas and put them forward to the Unicode Technical Committee, sometimes starting with a post in this mailing list before a formal submission in the hope that the discussion will be helpful. Such discussion often improves the formal submission. That is the process, the way that Unicode progresses. > ... developers prefer investing time in SVG renderers or existing font technologies for OpenType (SVG fonts will come later when it will be capable of doing the same things as OpenType, for now it does not cover all the existing needs). Well, I do not know what developers prefer. There seems to be a need to send custom emoji in interoperable Unicode plain text and I have put forward an idea for how to do it. William Overington Tuesday 4 April 2017 From asmusf at ix.netcom.com Tue Apr 4 11:51:45 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 4 Apr 2017 09:51:45 -0700 Subject: Encoding of old compatibility characters In-Reply-To: <838tngp6cm.fsf@gnu.org> References: <92ba6970-86e1-5d80-e3c9-239283a384b0@gmail.com> <41b2170a-6efb-518d-8c02-3881fbb09bae@kli.org> <2ba990ce-9d57-4e8b-b4dd-e9f1a821cd3b@gmail.com> <4q7f39oed2.fsf@chem.ox.ac.uk> <2d2b2a87-f4d8-7f28-59de-f6cf7437c9c5@ix.netcom.com> <7e7af7d6-dfc4-159a-832f-e60f24136b0f@gmail.com> <83fuht6fqg.fsf@gnu.org> <838tngp6cm.fsf@gnu.org> Message-ID: <3cf59c63-ee7e-a805-d8d3-84b1597b20e7@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Tue Apr 4 11:55:04 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 4 Apr 2017 09:55:04 -0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mark at macchiato.com Tue Apr 4 11:58:18 2017 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Tue, 4 Apr 2017 18:58:18 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: Amusing at this is, hard to believe that people are spending this much time on an April Fool's posting. I'm looking forward to similar postings on checkers and go pieces. As a matter of fact, one that proposes adding new characters for every possible configuration of a go board would be imaginative. And I'm looking also forward to the ?+ZWJ+?? (etc) proposal. Mark Mark On Tue, Apr 4, 2017 at 3:00 PM, Philippe Verdy wrote: > > > 2017-04-04 1:30 GMT+02:00 Michael Everson : > >> On 3 Apr 2017, at 23:07, Asmus Freytag (c) wrote: >> > >> > On 4/3/2017 2:15 PM, Michael Everson wrote: >> >> On 3 Apr 2017, at 17:16, Asmus Freytag wrote: >> >> >> >>>>> The same indirection is at play here. >> >>>>> >> >>>> This is pure rhetoric, Asmus. It addresses the problem in no way. >> >>>> >> >>> Actually it does. I'm amazed that you don't see the connection. >> >>> >> >> I?ve never understood you when you back up into that particular kind >> of abstract rhetoric. >> > >> > Sometimes thinking through something in abstract terms actually >> clarifies the situation. >> >> Of course I know that?s your view. It?s just never been an effective >> communication strategy between you and me generally. >> >> >>> The ?meaning? of a chess-problem matrix is the whole 8 ? 8 board, not >> the empty dark square at b4 or the white pawn on >> > >> > In other words, you assert that partial boards never need to be >> displayed. (Let's take that as read, then). >> >> No, I am sure that a variety of board shapes can be set in plain text >> with these conventions, though the principle concern is classical chess >> notation. >> >> >> The ?problem? the higher-level protocol is supposed to solve is the >> one where a chess piece of one colour sits in an em-squared zone whether >> light or dark. In lead type this was a glyph issue. Lead type had just >> exactly what my proposal has: A piece with in-line text metrics, spaced >> harmoniously with digits and letters, and square sorts with and without >> hatching. >> > >> > Leaving aside the abstract question whether modeling lead type is ipso >> facto the best solution in all cases? >> >> I think it was a good expedient solution in lead type and that this >> proposal offers a robust parseable digital version of that solution, and I >> assert people will make use of that data structure. >> >> >> OK, then you support the part of the proposal that applies VS1 and VS2 >> to the chess pieces. >> > >> > My statement just was that a proposal where piece + VS should be >> M-square, piece w/o VS should be generic, might make some sense (and same >> for a suitable "empty" cell). >> > >> > The next question would be whether the alternation in background is >> best expressed in variation sequences or by some other means. >> >> I think the value in the data structures I have described is best >> retained as text. Anything else just seems it would be simply needlessly >> complex, >> >> > If you never need to show just a single field, then I concede that the >> main drawback of variation selectors for the background style is absent; >> however, reading ahead in your message, the partial grid appears to be >> common, therefore the reason to choose an alternate solution to the >> background style is a strong one. >> >> Well, it?s text, Asmus, so you can delete all but one line of a board if >> you want: >> >> ?????????????????? >> >> There. So? what are you talking about? It?s a text matrix. It?s like a >> kind of poem. >> >> ?????????? >> ?????????????????? >> ?????????????????? >> ?????????????????? >> ?????????????????? >> ?????????????????? >> ?????????????????? >> ?????????????????? >> ?????????????????? >> ?????????? >> >> It even looks like one. That?s a meaningful pattern. A kind of writing >> system. >> > > For me it looks like ASCII art, a hack mixing various characters intended > for different uses and ignoring all semantics, only working because it > reuses similar-looking glyphs instead of being an actual encoding. > That represetnation is absultely not semantically coherent. > > If we want to have true checkboard cells, we need characters specifically > for them, and in them we'll place (or not) chess pieces or any other > suitable symbol or letter. This means creating clusters (cell+ZWJ+piece). > This will be coherent. > > If we want to have borders for boards, we need coherent characters for > them (we do not expct them to be combined with pieces, just that they will > properly glue with cells in the middle of the board, and that their metric > match them in suitable fonts). > > The fact that legacy renderers or fonts won't display that correctly is > definitely not an argument. Many scripts still have problems being > represented with legacy renderers or fonts. But the encoding is made to be > coherent semantically. Fonts and rederers will adapt their properties to > render what is semantically wanted and that will be also pleasing to read, > and they still will be able to use various variants (e.g. emoji styles for > pieces, possibly with 3D effects and colors, possibly animated pieces, or > alternate decorative patterns in board cells, possibly photographic-based, > such as wood, marble, grass, sand, glass, iron...) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Tue Apr 4 12:47:06 2017 From: everson at evertype.com (Michael Everson) Date: Tue, 4 Apr 2017 18:47:06 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: On 4 Apr 2017, at 17:58, Mark Davis ?? wrote: > Amusing at this is, hard to believe that people are spending this much time on an April Fool's posting. I wondered how long it would take for someone to be taken in. The joke, of course, was hidden not inside the proposal, but inside the date. > I'm looking forward to similar postings on checkers You haven?t bothered to read the proposal, have you? > and go pieces. G? notation is rather different and this kind of solution might not be appropriate for it. That, however, is a different problem unrelated to this proposal. > As a matter of fact, one that proposes adding new characters for every possible configuration of a go board would be imaginative. You really haven?t bothered to read the proposal, have you? > And I'm looking also forward to the ?+ZWJ+?? (etc) proposal. I recommend that you read the proposal before attempting to dismiss it. Michael Everson PS. Interested readers may wish to review some other proposals by myself and others. N4014 2011-04-01 was successful N4012 2011-04-01 was successful N4011 2011-04-01 was not successful* N3412 2008-04-01 was not successful N3066 2006-04-01 was successful N2935 2005-04-01 was successful N258A 2003-04-01 was not successful N2338 2001-04-01 was successful N2326 2001-04-01 was not successful *Though given recent symbol work by some it might be prudent to revive some part of this one. PSS: While games like chess, draughts, g?, and xi?ngq? are pastimes, they are also complex intellectual pursuits which have amassed a sizeable literature over many centuries. Chess notation and chess diagrams is a good example. Kifu notation for g? is another. The UCS encodes characters which represent the pieces of many games. It is reasonable to expect that people may wish to use these characters to represent game data. Asmus? idea that the 12 chess characters be duplicated or triplicated in order to set chess diagrams is wasteful of encoding space and not extensible either. We have seen that some 84 additional chess characters have been proposed; it would be a very bad idea to expand that to 168 or 252 characters. The appropriate way to respond to the great many differences in the ASCII-encoded existing chess fonts is to simply make use of existing characters in the standard to alter, in a systematic and standardized way, the glyph representation of the 12 already-encoded characters with 2 other already-encoded characters, as described in the proposal. Years ago a proposal similar to Asmus? was made, in discussion if not in a formal document. The answer was ?a higher level protocol would be best for chessboard notation?. Well, the simplest higher-level protocol for this is to use variation selectors to alter the font display, just as we use them for DIGIT ZERO, 16 Myanmar letters, INTERSECTION, UNION, SUBSET OF WITH NOT EQUAL TO, a bunch of other mathematical characters and more than 300 pictographs. Michael Everson From irgendeinbenutzername at gmail.com Tue Apr 4 12:53:29 2017 From: irgendeinbenutzername at gmail.com (Charlotte Buff) Date: Tue, 4 Apr 2017 19:53:29 +0200 Subject: Emoji Compatibility Symbols Message-ID: I am trying to reconstruct what the 66 emoji compatibility symbols that were included in some old drafts originally mapped to, but useful information on the web seems a bit sparse. It was fairly easy to figure out that compatibility symbols 1 through 16 eventually became proper characters (or sequences) and turned into ??, ??, ??, ??, ??, ????, ????, ????, ????, ????, ????, ????, ????, ????, ????, and ?. However, that still leaves 50 symbols that don't correspond to any Unicode characters. I did find this project that assigned names to private-use codepoints, and the related mappings from those codepoints to the different carrier sets . Unfortunately, I still don?t know what images or meanings were associated with those numbers. Searching for SoftBank emoji gave me a neatly organized list of 404 errors and KDDI was equally fruitless. Documents on the Unicode website itself regularly mention that EMS 17 through 66 are needed for round-trip mappings but never what these mappings actually were as far as I could find. Does anybody have this information available? -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Tue Apr 4 12:54:31 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 4 Apr 2017 18:54:31 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> Message-ID: <20170404185431.07dbe483@JRWUBU2> On Tue, 4 Apr 2017 01:30:05 +0100 Michael Everson wrote: > > I'm trying to work out whether we need a variation sequence for > > "chesspiece in a sentence?. > > Of course! Haven?t you ever seen chess problem texts? Check out the > Fairy Chess proposal for encoding additional characters. Plenty of > examples there. Your examples did not have to contend with the possibility of fonts that only support the variants for drawing chessboards. > Sorry, I meant ?Of course **not**!? that is, chesspiece in a sentence > is extremely common, and should be the default (not stylized) form. > We can?t repurpose that to be ?chesspiece on a white square? because > it hasn?t been previously and changing that would affect the layout > of existing data. But would not your proposal make it legitimate for a font to supply only chess pieces on dark backgrounds for the chess piece characters? Richard. From markus.icu at gmail.com Tue Apr 4 13:46:39 2017 From: markus.icu at gmail.com (Markus Scherer) Date: Tue, 4 Apr 2017 11:46:39 -0700 Subject: Emoji Compatibility Symbols In-Reply-To: References: Message-ID: There were some symbols, mostly proprietary logos, that we did not propose for encoding in Unicode. See pages 83-89 of http://www.unicode.org/L2/L2010/10132-emojidata.pdf You could also mine the defunct symbols subcommittee page for more information: https://sites.google.com/site/unicodesymbols/Home/emoji-symbols Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From irgendeinbenutzername at gmail.com Tue Apr 4 15:02:23 2017 From: irgendeinbenutzername at gmail.com (Charlotte Buff) Date: Tue, 4 Apr 2017 22:02:23 +0200 Subject: Emoji Compatibility Symbols Message-ID: Markus Scherer wrote: > There were some symbols, mostly proprietary logos, that we did not propose > for encoding in Unicode. See pages 83-89 of > http://www.unicode.org/L2/L2010/10132-emojidata.pdf That document was very helpful, but unfortunately many of the images are missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Tue Apr 4 19:02:32 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 01:02:32 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170404185431.07dbe483@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> <20170404185431.07dbe483@JRWUBU2> Message-ID: <07DD2CE0-5510-49A3-883A-EF7A1A34C80E@evertype.com> On 4 Apr 2017, at 18:54, Richard Wordingham wrote: > > On Tue, 4 Apr 2017 01:30:05 +0100 > Michael Everson wrote: > >>> I'm trying to work out whether we need a variation sequence for "chesspiece in a sentence?. >> >> Of course! Haven?t you ever seen chess problem texts? Check out the Fairy Chess proposal for encoding additional characters. Plenty of examples there. > > Your examples did not have to contend with the possibility of fonts that only support the variants for drawing chessboards. Um, what? Why would anyone make a font that supports the variants for drawing chessboards (which require the encoded characters 2654..265F) not put in glyphs for those? FontLab is the program I use to add OpenType features to my fonts, and if I try to add a sequence like 2654 + FE00 and the font doesn?t have a 2654, if flags it as an error and insists that the character appear in the font. OK, someone could be perverse and not add glyphs to those code positions, but? But nobody making a chess font with actual support for chess would do that. So this is another red herring. As far as I can see, your worries are groundless, and nothing has suggested that there?s something wrong with the proposal. Also, having implemented it in three or four different fonts now, I find that it works. It does the job, and it?s easy to use to edit. >> Sorry, I meant ?Of course **not**!? that is, chesspiece in a sentence is extremely common, and should be the default (not stylized) form. We can?t repurpose that to be ?chesspiece on a white square? because it hasn?t been previously and changing that would affect the layout of existing data. > > But would not your proposal make it legitimate for a font to supply only chess pieces on dark backgrounds for the chess piece characters? What does ?legitimate? mean? Nothing prevents someone from drawing the 16 Myanmar base characters with rings at the ends of their glyphs even though now VS are being recommended for that presentation. Is it legitimate to do that? Of course it is. It?s legitimate to make Myanmar fonts with square glyphs rather than circular ones. This proposal provides a stable encoding model for drawing chessboards simply, with fonts. Currently there are other fonts which do this, but they do not share encodings, and so sharing chessboard data is dependent on whether you have set up your board in the same font encoding that somebody else is using. Otherwise it doesn?t work, and your text is corrupt and you have to re-key various elements in order to use the glyphs of the other font. This problem is described in detail at the beginning of the proposal. It is the same problem we had with ISO/IECE 8859-1, -2, -3. -4 etc before we had the UCS. So: we have unstable non-Unicode encodings for chessboards now, this proposal provides stable Unicode encodings. This can only benefit the community of users of chess fonts. Anybody who isn?t setting chessboards is unaffected, just as I am unaffected by variation selectors used for glyph variation in mathematical fonts. (I might add the slashed zero glyph to Everson Mono, though.) This proposal does this while leaving the base characters alone so they can be used as chesspieces in text (as they have been since Unicode 1.1) and by adding a mechanism to construct the glyphs necessary for presenting chessboard data. This proposal uses a mechanism which has already been used for dozens of regular characters and 310 times for some popular pictographs. No new characters need to be added. Just a list of items in a text file. Can you identify an actual problem? Michael Everson From gwalla at gmail.com Tue Apr 4 21:41:25 2017 From: gwalla at gmail.com (Garth Wallace) Date: Tue, 4 Apr 2017 19:41:25 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170404004701.19ad750c@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> Message-ID: On Mon, Apr 3, 2017 at 4:47 PM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > On Mon, 3 Apr 2017 23:35:52 +0100 > Michael Everson wrote: > > > On 3 Apr 2017, at 22:03, Richard Wordingham > > wrote: > > The relevant text before was, > > "I'm talking about looking for a U+2654 glyph for ordinary text when > all the first font tried has is: > > 2654 FE01; Chesspiece on white; # WHITE CHESS KING > 2654 FE02; Chesspiece on black; # WHITE CHESS KING" > > > > Should it give a glyph for U+2654 or not? > > > Of course. Why wouldn't it? It?s a graphic character. > > What my conceptual example font has is not the sort of glyph one would > want for sentences like "Alice ? d4 meets White Queen ? (with shawl)". > > > I don?t see how anything you?re saying either identifies or tried to > > solve any actual problem with the proposal. The proposal says ?put > > some substitution tables into your chess font to display a particular > > glyph? and some apps do that and some don?t. You can?t use VS with > > apps that don't. > > I'm trying to work out whether we need a variation sequence for > "chesspiece in a sentence". We need the advice of someone who's worked > on font fallback. > > You don't need substitution tables to be executed if your application > can just look up glyphs for variation sequences. > I haven't worked on font fallback but maybe I can add something to this. Honestly, I'm not sure we need to make a distinction between piece-on-light-square and piece-in-notation at the SVS level. Currently, chess fonts can be (roughly) divided into "diagram fonts" and "notation fonts". A diagram font: - Is fixed-width (at least for the chess figurines themselves) - Centers each figurine in the character cell - Has a means of producing dark squares and on-dark-square equivalents of the figurines, either through separate allocation or a "combining dark square background" mechanism (usually a negative kerning hack) - Usually has board border elements, and may have decimal digits and a subset of the lowercase Basic Latin alphabet for labeling ranks and files A notation font: - May be proportional - Has figurines sitting on the baseline (Neither is *required* for figurine notation. They just look nice.) None of the features required for a diagram font are unacceptable in figurine notation: they are either irrelevant (dark squares, border elements) or acceptable visual variation (fixed width, vertical centering). Most chess fonts are of the diagram type, and figurines from diagram fonts may be (and frequently are) used in figurine notation. A font with figurines sitting on the baseline would not be illegible in diagrams, just a bit clumsy-looking. A proportional-width font would be unacceptable for proper typesetting of a diagram since board spaces would not line up properly, but would likely still be readable. In addition, when figurines for notation and for diagrams are distinguished, they are distinguished above the character level, in runs of like type: rows of a diagram, or lines of figurine notation. This is not unlike proportional vs. tabular digits. A font that supported both could default to fixed-width figurines (the "safer" option) and provide proportional figurines through a stylistic set. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerrietm at icloud.com Tue Apr 4 22:00:25 2017 From: gerrietm at icloud.com (Gerriet M. Denkmann) Date: Wed, 5 Apr 2017 10:00:25 +0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: References: Message-ID: <0D48F3F4-2410-4F80-974C-28EF8FF325A4@icloud.com> > On 4 Apr 2017, at 00:00,Asmus Freytag wrote: > > It is not possible to construct a set of secure network identifiers based on simply > a) ensuring the string is in NFC > b) otherwise allowing all of the Thai characters (insofar as the they are PVALID in IDNA 2008 [RFC5892]). > > Considerable attention to allowable contexts is required. There is a group in Thailand working on this, but their results have not yet been made public. Maybe this: Proposal for the Thai Script Root Zone Label Generation Rulesets But the rules for Root Zone Labels are (rightly) much more restricted than what I want: Any two strings which look (almost?) identical should be normalised into some canonical form. Reason: not to have identical looking filenames in a filesystem. With the current rules of normalisation there could be 8 different filenames all looking identical to ???????????????????. E.g. : - both NIKHAHIT + Sara Aa and Sara Am should be normalised into the same string (whatever this is) - both top-vowel + tone-mark and tone-mark + top-vowel should be normalised into the same string (whatever this is). etc. If, as Richard Wordingham wrote: "Unicode combining classes cannot be changed. All that can be done is to enforce the order of characters in normalised text.? then the Unicode Normalisation algorithms should be updated. Kind regards, Gerriet. From gerrietm at icloud.com Tue Apr 4 22:45:43 2017 From: gerrietm at icloud.com (Gerriet M. Denkmann) Date: Wed, 5 Apr 2017 10:45:43 +0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: References: Message-ID: > On 4 Apr 2017, at 23:51,Richard Wordingham wrote: > > On Tue, 4 Apr 2017 09:39:57 +0700 > "Gerriet M. Denkmann" wrote: > >> So the rule should be: >> >> A consonant may have zero or one tone/other marks and also zero or >> one top/bottom vowels. Exceptions: >> NIKHAHIT + tone mark (no top/bottom vowel) >> MAITAIKHU + tone mark (no top/bottom vowel) > > This list is not exhaustive. > The order of MAITAIKHU and tone mark is significant - it should affect rendering. Most fonts disagree (exception: Tahoma and Microsoft Sans Serif). Are there minority languages where the order has really a semantic meaning? Could one create a list of all possible combinations of non-spacing marks for Thai, minority languages and languages written using Thai characters (e.g. Pali, Sanskrit, Khmer, Burmese, etc.)? Including cases, where the order of these marks has a semantical meaning. The next step would then to agree on rules of normalisation. For use in domain names, there probably need to be additional rules. This is not what I am concerned with. The normalisation has (almost ?) nothing to do with the question of fonts. E.g. ??? in both variants (vowel + mark and mark + vowel) look identical in about a dozen Thai fonts - the only exception being Apple's Thonburi font, which refuses to show the non-normalised form correctly. If a font can not correctly combine non-spacing marks, the font manufacturer should be notified. Kind regards, Gerriet. From richard.wordingham at ntlworld.com Tue Apr 4 22:50:56 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 5 Apr 2017 04:50:56 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <07DD2CE0-5510-49A3-883A-EF7A1A34C80E@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> <20170404185431.07dbe483@JRWUBU2> <07DD2CE0-5510-49A3-883A-EF7A1A34C80E@evertype.com> Message-ID: <20170405045056.309b67b4@JRWUBU2> On Wed, 5 Apr 2017 01:02:32 +0100 Michael Everson wrote: > On 4 Apr 2017, at 18:54, Richard Wordingham > wrote: > > > > On Tue, 4 Apr 2017 01:30:05 +0100 > > Michael Everson wrote: > > > >>> I'm trying to work out whether we need a variation sequence for > >>> "chesspiece in a sentence?. > >> > >> Of course! Haven?t you ever seen chess problem texts? Check out > >> the Fairy Chess proposal for encoding additional characters. > >> Plenty of examples there. > > > > Your examples did not have to contend with the possibility of fonts > > that only support the variants for drawing chessboards. > > Um, what? > > Why would anyone make a font that supports the variants for drawing > chessboards (which require the encoded characters 2654..265F) not put > in glyphs for those? A stop-gap font based on poor glyphs comes to mind. > FontLab is the program I use to add OpenType features to my fonts, > and if I try to add a sequence like 2654 + FE00 and the font doesn?t > have a 2654, if flags it as an error and insists that the character > appear in the font. OK, someone could be perverse and not add glyphs > to those code positions, but? Is this a sequence for the GSUB table or for the cmap table? The font I have in mind would have no entry for U+2654 in its cmap format 4 subtable but would, following the proposal you put up on Saturday, have entries for and in its cmap format 14 subtable. This approach is entirely consistent with the conception of variation sequences as pseudo-encoding. I could make such a font (for one chesspiece), but it would take at least an evening. Now, I have sought advice on the OpenType list, and have received the opinion of one person, to wit that if I have a glyph mapping for , I am obliged to have a genuine mapping for U+2654, i.e. not a map to .notdef. On the basis of this advice, the font writer therefore has three choices - expose his poor, possibly proportionally spaced glyphs for use as default U+2654 (probably the best choice), make the glyph for the default glyph (not a disastrous choice), or make the glyph for the default glyph (a malevolent choice in my view). If one has no control over the fallback sequence for glyphs, arguably the situation for truly 'plain text', then the escape root for plain text is to have the font with good chess glyphs for use in running text declare that it has the glyph for such use. That requires the definition of a variation sequence to force the choice of suitable glyph. Now, having to use variation sequences for chess pieces in plain text is unfortunate, but should also work with existing fonts supporting chess pieces. There would be transitional effects as existing fonts were modified to declare that they supported this variation sequence - the effects of font fallback would vary as the new fonts were added to the system. > But nobody making a chess font with actual support for chess would do > that. Note that the font I have in mind is just supporting chessboards. The idea would be that other fonts would be used for high quality rendering. > Nothing prevents someone from drawing the 16 Myanmar base characters > with rings at the ends of their glyphs even though now VS are being > recommended for that presentation. Is it legitimate to do that? Of > course it is. You seem to be declaring that it would not be wrong for chess piece characters in running text to be automatically depicted with dark chess square backgrounds. > Can you identify an actual problem? See above. Richard. From richard.wordingham at ntlworld.com Tue Apr 4 23:23:55 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 5 Apr 2017 05:23:55 +0100 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: <0D48F3F4-2410-4F80-974C-28EF8FF325A4@icloud.com> References: <0D48F3F4-2410-4F80-974C-28EF8FF325A4@icloud.com> Message-ID: <20170405052355.002ead44@JRWUBU2> On Wed, 5 Apr 2017 10:00:25 +0700 "Gerriet M. Denkmann" wrote: > Any two strings which look (almost?) identical should be normalised > into some canonical form. Reason: not to have identical looking > filenames in a filesystem. With the current rules of normalisation > there could be 8 different filenames all looking identical to > ???????????????????. > E.g. : > - both NIKHAHIT + Sara Aa and Sara Am should be normalised into the > same string (whatever this is) I think the answer to this is for renderers to insert a dotted circle in the former. I hope no-one is going to argue that NIKHAHIT + SARA AA is appropriate for Sanskrit. NFKC is not the answer; NFKC(???) = ????. > - both top-vowel + tone-mark and tone-mark + top-vowel should be > normalised into the same string (whatever this is). etc. TUS declares that ??? (vowel then tone mark) and ??? (tone mark then vowel) should render differently. Unfortunately, there is a tendency for mark to mark positioning, if employed at all, to be restricted to combinations that actually occur in correctly spelt Thai. A particularly nasty example is that doubled vowels above can be indistinguishable from single vowels above. I got an angry response when I suggested that mark-to-mark positioning should be used for all combinations of marks above - allegedly it makes the GPOS tables 'too big'. There's also the very high confusability of and . Traditionally, SARA UE is SARA I plus NIKHAHIT, and I suspect this is the origin of the etymologically odd form of ????? 'lingam'. > If, as Richard Wordingham wrote: "Unicode combining classes cannot be > changed. All that can be done is to enforce the order of characters > in normalised text.? then the Unicode Normalisation algorithms should > be updated. I think it will be a long time before canonical equivalence is replaced by canonical equivalence Version 2, but we may not have to wait many centuries. In the mean time, you will have to work with your own folding. Richard. From richard.wordingham at ntlworld.com Tue Apr 4 23:37:08 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 5 Apr 2017 05:37:08 +0100 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: References: Message-ID: <20170405053708.1b0b9b3e@JRWUBU2> On Wed, 5 Apr 2017 10:45:43 +0700 "Gerriet M. Denkmann" wrote: > > On 4 Apr 2017, at 23:51,Richard Wordingham > > wrote: > > The order of MAITAIKHU and tone mark is significant - it should > > affect rendering. > Most fonts disagree (exception: Tahoma and Microsoft Sans Serif). Are > there minority languages where the order has really a semantic > meaning? I think not. Most fonts are incompetent at displaying typing errors. > Could one create a list of all possible combinations of non-spacing > marks for Thai, minority languages and languages written using Thai > characters (e.g. Pali, Sanskrit, Khmer, Burmese, etc.)? Including > cases, where the order of these marks has a semantical meaning. > The next step would then to agree on rules of normalisation. Most of the 'normalisation' is straight forward. 1) Repeatedly swap mark above and following mark below. 2) Apply Unicode normalisation. Then 3) Use a font that uses mark-to-mark positioning on all combinations of vowels above and all combinations of vowel below. NIKHAHIT followed by SARA AA needs special handling. I am not sure how well the general case will work - particularly with fonts that do their own reordering. You also need to decide whether to fold and . I've started to see fonts make an artificial distinction. You may wish to note that it can be very hard to tell the difference between U+002D HYPHEN-MINUS and U+2013 EN DASH in file names. Richard. From richard.wordingham at ntlworld.com Wed Apr 5 03:10:30 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 5 Apr 2017 09:10:30 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403203355.6cbfc184@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> Message-ID: <20170405091030.116883a5@JRWUBU2> On Mon, 3 Apr 2017 20:33:55 +0100 Richard Wordingham wrote: > On Sun, 2 Apr 2017 10:43:39 -0700 > Asmus Freytag wrote: > The basic text elements in the scheme other than boundary markers will > be: > > empty white square > empty black square > white square with specific piece on it > black square with specific piece on it. > > If the variation selectors are ignored, these simplify to: > > white square > hatched square > specific piece > > This preserves all the information; the pattern of squares is known in > advance and therefore redundant. Now, Asmus's VS scheme is: empty white square empty black square piece with matching spacing piece with dark back ground and matching spacing. Now, what happens to the two scheme if rendered with yellow text ('foreground') on a blue background? I believe the 'empty black square' will have yellow hatching on a blue back ground. Will the empty white square be white or blue? Will the 'piece with matching spacing' have a white background around the depiction of the piece, or a blue background? What of a 'white square with a specific piece on it'? A piece with a *white* background is different to a piece that is merely an outline, whether filled or not. Richard. From lokedhs at gmail.com Wed Apr 5 03:18:30 2017 From: lokedhs at gmail.com (=?UTF-8?Q?Elias_M=C3=A5rtenson?=) Date: Wed, 5 Apr 2017 16:18:30 +0800 Subject: PETSCII mapping? Message-ID: I have been searching, trying to find some information as to why there is a large set of symbols in PETSCII which cannot be mapped to Unicode. PETSCII is the character set used by the Commodore 64, which was an incredibly popular computer in the 80's, and still remains in use to this day. More information on the character set can be found on Wikipedia: https://en.wikipedia.org/wiki/PETSCII Searching for PETSCII in the archives for this mailing list only reveals two messages, none of which addresses my question. Given that this is a web-documented character set, used by a very popular system, there must be some reason why these symbols are missing from Unicode. Could anyone provide some information on this? Regards, Elias -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 5 04:49:52 2017 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Wed, 5 Apr 2017 02:49:52 -0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: <0D48F3F4-2410-4F80-974C-28EF8FF325A4@icloud.com> References: <0D48F3F4-2410-4F80-974C-28EF8FF325A4@icloud.com> Message-ID: <43594462-3b59-d97d-d0f8-a814e890946e@ix.netcom.com> On 4/4/2017 8:00 PM, Gerriet M. Denkmann wrote: >> On 4 Apr 2017, at 00:00,Asmus Freytag wrote: >> >> It is not possible to construct a set of secure network identifiers based on simply >> a) ensuring the string is in NFC >> b) otherwise allowing all of the Thai characters (insofar as the they are PVALID in IDNA 2008 [RFC5892]). >> >> Considerable attention to allowable contexts is required. There is a group in Thailand working on this, but their results have not yet been made public. > Maybe this: Proposal for the Thai Script Root Zone Label Generation Rulesets Just as long as you understand that it's not final, even for the problem domain it intends to address. > > But the rules for Root Zone Labels are (rightly) much more restricted than what I want: One key difference is that the rules define a preferred ordering, and do not define a folding. Obviously, knowing a preferred ordering allows anyone to define a folding that results in that ordering. Another generic difference between an LGR for network identifiers (Root Zone or otherwise) and filenames is that and LGR will tend to disallow pathological combinations, even if they are in an unambiguous order. "Pathological" combinations are those that result in unpredictable rendering - not just for a few isolated fonts, but across the board. I would argue that for complex scripts, there may be a case for restricting filenames in a similar manner: expecting that any random combining sequence of unbounded length (up to the full filename) should be supported will surely lead to filenames that are impossible to tell apart; usually because they either do not get rendered in a sensible way, or things get clipped. This may even be the case for combining sequences in general. LGRs, and the Root Zone LGR in particular, go one step further: they tend to explicitly excluded characters that are obsolete, rare, historic, special use, and so on; this is done for two main reasons: to keep the resulting names recognizable to the majority of users and to avoid the kinds of problems introduced by these characters. For example, for Arabic, the consensus seems to be that for domain names, one really doesn't want to support the combining marks. They are not needed there, unlike general text, and only lead to a bewildering host of non-normalizable dual representations, for which otherwise a folding would have to be defined. Finally, LGRs have some features that go beyond having a clean and focused repertoire and a defined ordering: those are the cases where two strings look identical, but neither can be construed as "preferred". In an LGR these strings can be made "mutually exclusive" using the blocked variant mechanism (see RFC 7940). Some file systems have rudimentary forms of this, for example those that are case-preserving but not case-sensitive. Once a filename is used, its "variant" can no longer be added, but there's no a-priori folding into a preferred form. Other than performance, perhaps, there's no reason a file system's valid file name space couldn't be described via RFC 7940. (Even with the full features of RFC 7940, collision checking can be implemented as an O(1) process for each new file name to be added to a folder). In addition to NFC, some additional foldings might be supplied to transform user input to valid file names (from case folding to some more complex folding like the one you are discussing). Like case-insensitive, non-preserving file systems, adding such foldings would return file names that can be different from the ones the user specified. Again, whether or not you supply a folding is separate from defining a preferred ordering. For the latter, you might start with the work the Thai Generation Panel has been doing, so that valid network identifiers can immediately be valid file names. A./ > > Any two strings which look (almost?) identical should be normalised into some canonical form. > Reason: not to have identical looking filenames in a filesystem. > With the current rules of normalisation there could be 8 different filenames all looking identical to ???????????????????. > > E.g. : > - both NIKHAHIT + Sara Aa and Sara Am should be normalised into the same string (whatever this is) > - both top-vowel + tone-mark and tone-mark + top-vowel should be normalised into the same string (whatever this is). > etc. > > If, as Richard Wordingham wrote: "Unicode combining classes cannot be changed. All that can be done is > to enforce the order of characters in normalised text.? then the Unicode Normalisation algorithms should be updated. > > > Kind regards, > > Gerriet. > > From asmusf at ix.netcom.com Wed Apr 5 05:05:16 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 03:05:16 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170405091030.116883a5@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> Message-ID: On 4/5/2017 1:10 AM, Richard Wordingham wrote: > On Mon, 3 Apr 2017 20:33:55 +0100 > Richard Wordingham wrote: > >> On Sun, 2 Apr 2017 10:43:39 -0700 >> Asmus Freytag wrote: >> The basic text elements in the scheme other than boundary markers will >> be: >> >> empty white square >> empty black square >> white square with specific piece on it >> black square with specific piece on it. >> >> If the variation selectors are ignored, these simplify to: >> >> white square >> hatched square >> specific piece >> >> This preserves all the information; the pattern of squares is known in >> advance and therefore redundant. > Now, Asmus's VS scheme is: > > empty white square > empty black square > piece with matching spacing > piece with dark back ground and matching spacing. Actually, I'm now leaning towards a preference for any scheme that does not use VS, but relies on ligatures. Such a scheme would need a) no matching spacing for the bare pieces (the ligature with the empty square would result in the correct spacing) b) no pieces with built-in dark background (pieces simply ligate with the empty "black" square). > > Now, what happens to the two scheme if rendered with yellow text > ('foreground') on a blue background? According to Michael, the effect should be that of lead typography. This would mean that the entire ligature has the same ink color, and all parts that are not "ink" are the background color (paper color). Unlike lead typography, the ink can be perfectly opaque, allowing a lighter color to show on a dark background. Or the opacity of the foreground can be selected to an intermediate level, allowing the ink to look greenish in your example. > > I believe the 'empty black square' will have yellow hatching on a blue > back ground. > > Will the empty white square be white or blue? > > Will the 'piece with matching spacing' have a white background around > the depiction of the piece, or a blue background? What of a 'white > square with a specific piece on it'? > > A piece with a *white* background is different to a piece that is > merely an outline, whether filled or not. Unless you select an 'emoji_presentation' you do not get two-toned glyphs, therefore "white" is always the same as "transparent". This is true for anything in plain text, not just game pieces. If you want to have the dark squares have a blue background, but not the white squares, then you need to use markup to set the alternate background colors. (The results with a VS based system are not really different, because I imagine, the actual glyph repertoire is identical in all alternatives discussed so far - relying solely on ligatures has the benefit of not involving the UTC at all, therefore it could be implemented today without delay). A./ > > Richard. > From asmusf at ix.netcom.com Wed Apr 5 05:18:21 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 03:18:21 -0700 Subject: PETSCII mapping? In-Reply-To: References: Message-ID: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 5 05:20:28 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 03:20:28 -0700 Subject: Combining Class of Thai Nonspacing_Marks In-Reply-To: <20170405053708.1b0b9b3e@JRWUBU2> References: <20170405053708.1b0b9b3e@JRWUBU2> Message-ID: <2b25ede4-6c99-c1fa-ee96-d2b94415e114@ix.netcom.com> An HTML attachment was scrubbed... URL: From everson at evertype.com Wed Apr 5 08:08:03 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 14:08:03 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170405045056.309b67b4@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> <20170404185431.07dbe483@JRWUBU2> <07DD2CE0-5510-49A3-883A-EF7A1A34C80E@evertype.com> <20170405045056.309b67b4@JRWUBU2> Message-ID: On 5 Apr 2017, at 04:50, Richard Wordingham wrote: >> Why would anyone make a font that supports the variants for drawing chessboards (which require the encoded characters 2654..265F) not put in glyphs for those? > > A stop-gap font based on poor glyphs comes to mind. For pity?s sake. Yes, people can make and distribute crappy fonts. What are you on about? This is not serious criticism and what you suggest is no realistic scenario. > Is this a sequence for the GSUB table or for the cmap table? It?s the OpenType table. I said this already, twice. The entries are: sub uni2654 uniFE00 by uni2654FE00 ; sub uni2654 uniFE01 by uni2654FE01 ; sub uni2655 uniFE00 by uni2655FE00 ; sub uni2655 uniFE01 by uni2655FE01 ; and so on. > The font I have in mind would have no entry for U+2654 in its cmap format 4 subtable but would, following the proposal you put up on Saturday, have entries for and in its cmap format 14 subtable. OK, to hell with the cmap format 14 table. I don?t know what this is. I didn?t edit any such a table. I have text in my proposal about that because I took it from a proposal for Variation Sequences by Ken Lunde. I thought that was the same as the mechanism which I used to create the tables in my font, which worked, and worked well. I have implemented it. I would use my fonts (with variation selectors) in print and I would distribute the fonts. PLEASE LOOK. You have said that you would add and and that?s JUST what I have in my opentype table. So you and I are doing the same thing. > This approach is entirely consistent with the conception of variation sequences as pseudo-encoding. No, it?s a way of showing a particular glyph for an underlying base character. > I could make such a font (for one chesspiece), but it would take at least an evening. I?ve done it for three fonts already, Ludus, Condal, and William?s Quest font. > Now, I have sought advice on the OpenType list, and have received the opinion of one person, to wit that if I have a glyph mapping for > , I am obliged to have a genuine mapping for U+2654, i.e. not a map to .notdef. Didn?t I say this just yesterday? U+2654 displays the character glyph on its own. With U+FE00 the font displays that glyph on an em-square sized background suitable for use as ?chesspiece on light square? and with U+FE01 the font displays that glyph on an em-square sized background suitable for use as ?chesspiece on dark square?. A font which says that it is suitable to set chessboards has to have all three glyphs in it. In some fonts the first two MIGHT be the same but they do not HAVE to be because the ?piece on a light square? may be designed to be inappropriately wide, or have the chesspiece glyph at a different height vis ? vis the baseline, for the one glyph to suit both. Is this unclear? I have tried to explain it clearly. Does this differ from whatever it is you?re talking about? Because it sounds like it?s just exactly what you?re talking about. > On the basis of this advice, the font writer therefore has three choices - expose his poor, possibly proportionally spaced glyphs for use as default U+2654 (probably the best choice), make the glyph for the default glyph (not a disastrous choice), or make the glyph for the default glyph (a malevolent choice in my view). Look, Richard, I didn?t invent variation selectors. Variation Selectors are used for lots of maths characters, a whole bunch of Myanmar characters, and 310 emojis. FOR NONE OF THOSE OTHER PROPOSALS has anybody wrung their hands saying that ?oh gosh, there might be ugly glyphs in the font, so we shouldn?t use Variation Selectors Plain chess pieces for use within text may often be proportionally-spaced, and there is nothing wrong or poor or undesirable about that. Anyone wanting to support such glyphs has been able to do so since Unicode 1.1. All right? THat?s the status quo for chess characters in the standard. It?s not possible to use proportionally-spaced fonts to set chessboards, because those have to have mono-width squares. We could treble the number of chess fonts in the standard from 12 to 36 (and from 84 to 252 for the current set of fairy chess characters in another proposal) but this isn?t a good idea. We don?t need 288 chess characters encoded. We need 96. *I* say this, and I?m a notorious splitter. First, in discussions with UTC representatives in previous years, I understand that this is not desirable. Second, it?s not necessary, because variation selectors work well (I have implemented it as proof) and because board squares are, essentially, a special rendering shape (glyph) for the chess pieces. > If one has no control over the fallback sequence for glyphs, arguably the situation for truly 'plain text', then the escape root for plain > text is to have the font with good chess glyphs for use in running text declare that it has the glyph for such use. Richard, I?ve shown some examples of the Looking-Glass problem where the VS sequences are ignored. Did you see these? Why don?t you refer to them. You?re talking in the abstract as though you haven?t read the proposal or looked at the examples it gives. In Figure 3, you can see the base glyphs in the font which might be used for any purpose. For the special purpose of setting a chessboard, you need to use the VS sequences. If your font or your app can?t display that, that?s a problem, but that is no different for ANY app or font that can?t display fi/fl ligatures, or maths characters with VS, or Myanmar characters with VS, or emoji characters either in colour or black-and-white with or without ligatures. Why is this such a dreadful problem for chess when it?s not for any of the other character types which use VS sequences? You haven?t explained this inchoate worry of yours. My proposal admits freely that VS-derived glyphs for chessboards might fail in some environments (but also shows that a board set using this scheme is still legible by humans and parseable by software even if the good display may fail. But the point is that chess fonts are specialized fonts, and if people want to set chess problems they need special chess fonts. Such fonts exist right now, but only with conflicting ASCII encodings. THAT is worse that maybe some fonts having ugly glyphs or maybe some environments can?t display OpenType sequences properly. THAT is the problem that is solved by this proposal. > That requires the definition of a variation sequence to force the choice of suitable glyph. These sequences are what the proposal gives. Why are you saying this? > Now, having to use variation sequences for chess pieces in plain text is unfortunate, No, it isn?t. It?s a better idea to use those for 96 chess characters than to have to get the committees to accept 288 chess characters (which I very much doubt they will) which by the way also puts off a solution for chess for another two years. > but should also work with existing fonts supporting chess pieces. It can, if and only if the makers of those fonts add the new glyphs and new VS sequences to their fonts. This is also true for maths fonts, for Myanmar fonts, and for emoji fonts. > There would be transitional effects as existing fonts were modified to declare that they supported this variation sequence - the effects of font fallback would vary as the new fonts were added to the system. Yes, that is what will happen if this proposal is selected. No fonts are magically altered. Are you supporting my proposal or objecting to it? I can?t even tell any more. >> But nobody making a chess font with actual support for chess would dothat. > > Note that the font I have in mind is just supporting chessboards. The idea would be that other fonts would be used for high quality rendering. WHAT? What does this mean? Who would make a font supporting just chessboards and not supporting display of chess characters otherwise? Anyway you can?t. You can't even MAKE a substitution table if the base characters aren?t in the font. FontLab complains and then politely asks if you want to add the glyphs to the font, and when you say Yes (which you must if you want your OT sequences to work) then it puts those characters in the font so you can put glyphs into them. Is this explained clearly enough for you? If you?re worrying about whether the font designer will bother to make NICE glyphs for those characters, well, that is up to the wit of the font designer. And that has nothing to do with the proposal. > >> Nothing prevents someone from drawing the 16 Myanmar base characters >> with rings at the ends of their glyphs even though now VS are being >> recommended for that presentation. Is it legitimate to do that? Of >> course it is. > > You seem to be declaring that it would not be wrong for chess piece characters in running text to be automatically depicted with dark chess square backgrounds. If a font designer is perverse enough to do that, his font won?t be nice and nobody will use it. It is possible for someone to depict chess piece characters in running text with dark chess square backgrounds, but since that?s not the convention for doing so, nobody would like the font and nobody would use it. >> Can you identify an actual problem? > > See above. You failed to identify an actual problem with the proposal. The actual problem is (1) chess fonts aren?t using unicode characters (2) VS selectors can help provide a standardized way that enables chess fonts to do so and (3) the proposal gives a mechanism for doing that which will work in environments where VS substitution glyphs are supported. Michael Everson From wjgo_10009 at btinternet.com Wed Apr 5 07:22:46 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 5 Apr 2017 13:22:46 +0100 (BST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> Message-ID: <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> Asmus Freytag wrote: > .... - relying solely on ligatures has the benefit of not involving the UTC at all, therefore it could be implemented today without delay). I am wondering whether that is correct. Where one implements a ligature using a ZWJ without the Unicode Technical Committee having agreed then that is fine where the meaning of the text is unchanged: for example, if one chooses to include, say, a pp ligature in a font. Yet to implement a ligature using a ZWJ where the meaning is changed, then I am wondering whether that needs the agreement of the Unicode Technical Committee. There have been some recent encodings where ZWJ has been used with two or more emoji characters to produce a new emoji character where the meaning of the result is different from the combined meanings of the ingredients, the meaning of that new character not always or maybe never being congruently obvious unless one already knows the meaning. If a ZWJ encoding for producing chess diagrams were to be introduced, then if it is not UTC that decides the detail, then who does decide? Would a non-UTC decision be interoperable, would it be supported? There are details that would need to be decided, such as how to produce an unoccupied white square and how to produce a white knight upon a white square. In my opinion a white knight upon a white square would need a ligature as the glyph might be different from a white knight in running text. It would be helpful if at the next UTC meeting the UTC could issue a statement clarifying with precision the situation over using ZWJ in this manner, maybe in relation to emoji as well. Going back to Michael's proposal, I like the proposal for the board and I hope that UTC accept it for inclusion in The Unicode Standard. I opine that the encoding should allow that the glyph for a white knight upon a white square is different from the glyph for the white knight that is used in running text. The advance widths of the two glyphs might be different each from the other, and the vertical position of the contours within the glyph may be different each glyph from the other. With regard to the border of the board I opine that it would improve the proposal if a variation selector also applied to the eight characters used for the border. This would mean that the glyphs for a chess board and its border could all be separate from the glyphs of other items in the font. This would mean that where there is an open source font available and licenced for making derivate fonts provided that the name of the font is changed then chess diagram glyphs and chess diagram border glyphs could be added to a font and satisfactory results obtained. On a different aspect of this thread, I have a metal type chess fount, bought in the 1960s. The fount is suitable for handsetting and printing chess diagrams. It was great fun, and changing a diagram after printing it by moving a knight was quite interesting as that process involved four pieces of type: removing a knight on one colour of square, putting an empty square of that colour in that place, removing an empty square of the other colour and then putting a knight of the same original knight colour on a square of that other square colour in its place. The knight did not take any other piece in that move. The fount was cast by the typefounder from matrices supplied by the Monotype corporation. The white squares were included in the fount, one does not have to rely on using normal spacing material for a white square. There were also four long thin typemetal pieces for the border. So here is a puzzle that results from that experience and yet also relates to the encoding of chess diagrams as in this thread. Suppose that one has a diagram for a valid position in a game of chess. Next one wants a diagram for the next valid position in the game. For the second diagram, first make a copy of the first diagram and then change some of the glyphs. How many glyphs need to be changed depends on the first position and the move that is made. Can you find examples, where 2, 3, 4, 5 and 6 different glyphs are involved: that is, the total number of both glyphs that are moved out from the diagram and glyphs that are moved in to the diagram? I have just made up that puzzle and I think that a result for one of the numerical values may not be possible, though I am not sure of that, but that all of the others are possible sometimes. With a metal chess fount, removed characters are carefully cleaned and then carefully placed back into the typecase. William Overington Wednesday 5 April 2017 From everson at evertype.com Wed Apr 5 09:38:01 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 15:38:01 +0100 Subject: Proposal to add standardized variation sequences for chess notation Message-ID: NOTE: A number of messages I sent in the last two days were scrubbed by the Unicode list software because they contained images. I will re-send these with links now. From: William_J_G Overington > Date: 2 April 2017 at 12:05:03 IST > I included the regular Unicode chess pieces themselves, and for each chess piece also versions on a white square and on a black square in the Private Use Area of my Quest text font. OK, I?m looking at this. William?s description uses rather different terms than mine does, so I?ll try to translate. First, he?s describing a font he made in 2004 or 2005, not an implementation of my proposal. > Free download of the Quest text font from http://www.users.globalnet.co.uk/~ngo/fonts.htm > > Thus, for, say, White King, there are three glyphs. Correct, just as my proposal would have it. And the metrics for e.g. white-king-for-use-in-text and white-king-for-use-in-chessboard are different. > The Quest text font has descenders, so that while the glyph for White King itself is sat on the baseline, the glyph for the White King on a white square has the chess piece positioned lower vertically. The background shading for White King on a black background goes down to the WinDescent level. By this he means more or less that the glyphs intended to produce a board have em-square metrics, while the base glyphs do not have square metrics and would be more suitable for in-text usage. [Picture of William?s three characters showing metrics] http://evertype.com/standards/unicode-list/overington-glyphs.png > Line spacing could be an issue, but it need not be as long as the OpenType-supporting application where the font is used has the facility to set type with no additional spacing. I use Serif PagePlus X7 and the facility is there, so diagrams look fine. > > I hope that Michael's proposal goes forward and is accepted. So does Michael. > Regarding the borders. I note that use of a variation selector is not suggested. Nor should it be. > As it happens, Quest text also has eight glyphs for producing a border, all eight being in the Private Use Area. They are rather ornate. They are at U+E5B0 through to U+E5B7. They are there. I had to figure out how the should be used. They are put together in a very different way than the borders of any other font I have seen are. I am not sure, but I think he?s intended to use them thus: [Pic of the Looking-Glass board in William?s font] http://evertype.com/standards/unicode-list/overington-board.png William?s design is decidedly non-traditional, and not (to my eye) particularly easy to read, but it doesn?t matter. The picture here shows his glyphs configured in exactly the same way as specified in my proposal. IT WORKS. (There are some hairline gaps in the border and the top left corner piece is a little less well aligned than one would if one were preparing to ship the font.) The underlying text just the same text that I used to set the Looking-Glass board in my proposal, variation selectors and all. There are no variation selectors used (or needed) for the border, though its glyphs are certainly unconventional horizontal lines. ;-) > The empty squares in the chess diagram each use a variation selector. I dont? see how. There were no OpenType instructions in the font or variation selector characters in the font. > I opine that it would be helpful if a variation selector were to be used for each of the eight border items. No, because this would lead to potentially infinite variety of borders within any font, and it would be better to restrict this. I wouldn?t even want a VS to distinguish a single-line border from a double-line border, and there?s an enormous variety of ornamental borders one could put on the glyphs for > Using a variation selector would mean that a diagram could be produced without relying on the basic designs of the eight character sorts used to produce the border and also would allow a stylish border design to be included in a font. A chess font is best when optimized to the design the designer wants, but honestly, the model proposed is simple and robust and does not need more tinkering or more complexity. It is able to support William?s design as well as the more traditional ones in the proposal while remaining parseable plain text. That should be enough. That is what takes the mess that current non-Unicode chess fonts are in and normalizes them for use. > Best regards, > > William Overington Thank you for sharing your font, William. I?ll send you the ttf of this one so you can tinker with glyph placement as you wish, if the proposal is accepted and the standardized variation sequences accepted. Michael Everson -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Wed Apr 5 09:48:16 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 15:48:16 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: Kent, I can?t read this in a plain-text e-mail. I can?t paste it into an ordinary word-processor like Word as in my previous response to Markus, or in Pages (left) or LibreOffice (right) as shown here. (I simply pasted in the text from Word to each of those. It?s odd to see that there is some variation in display the text without selecting it and applying the correctly-configured font to it, but when that?s done, the correct display is given (modulo some leading issues which I didn?t focus on in either). The workaround you give is just that. It works. It?s not usefully portable or user-friendly, and as higher-letter protocols go, it hasn?t swept away all competition for presenting chessboards. People use ASCII or MS Symbol-based fonts not even with any Unicode characters in them. http://evertype.com/standards/unicode-list/libreoffice-lg.png http://evertype.com/standards/unicode-list/pages-lg.png > On 3 Apr 2017, at 19:46, Kent Karlsson > wrote: > > > Den 2017-04-03 19:51, skrev "markus.icu at gmail.com " : > > > It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for the > > board layout (e.g., via a table), board frame style, and cell/field shading. > > In each field, the existing characters should suffice. > > > > markus > > True, and one can easily find an example online. > > Slightly modified from http://stackoverflow.com/questions/18505921/chess-using-tables > > > > > > >

True, and one can easily find an example online.

>

Slightly modified from http://stackoverflow.com/questions/18505921/chess-using-tables

> >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Wed Apr 5 09:49:41 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 15:49:41 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170405091030.116883a5@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> Message-ID: On 5 Apr 2017, at 09:10, Richard Wordingham > wrote: > Now, what happens to the two scheme if rendered with yellow text ('foreground') on a blue background? The same thing that happens to ANY graphic character if you choose to render the background as blue and the text as yellow. > I believe the 'empty black square' will have yellow hatching on a blue back ground. Well, it is good that you believe this. > Will the empty white square be white or blue? It will be blue, obviously. > Will the 'piece with matching spacing' have a white background around the depiction of the piece, or a blue background? What of a 'white > square with a specific piece on it?? This isn?t a problem and has nothing to do with my proposal. > A piece with a *white* background is different to a piece that is merely an outline, whether filled or not. I don?t think I can consider your comments to be relevant to the proposal any longer. You don?t even address the proposal. Oh, here is the answer to your question. It took me 15 seconds to change the background and text colour in Quark XPress. It has nothing to do with the proposal for variation sequences. http://evertype.com/standards/unicode-list/looking-glass-yellow-blue.png Michael Everson -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Wed Apr 5 09:52:16 2017 From: gwalla at gmail.com (Garth Wallace) Date: Wed, 05 Apr 2017 14:52:16 +0000 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <175BB7CB-08BA-4E80-8337-7EBCCB90B141@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> <175BB7CB-08BA-4E80-8337-7EBCCB90B141@evertype.com> Message-ID: On Wed, Apr 5, 2017 at 7:14 AM Michael Everson wrote: > Argh, Garth? please don?t shoot down our own proposal? I'm not, I'm just saying that if having symbols without VS not match either of the VSes is a sticking point, it's not hard to work around. > > > > On 5 Apr 2017, at 03:41, Garth Wallace wrote: > > > > I haven't worked on font fallback but maybe I can add something to this. > > > > Honestly, I'm not sure we need to make a distinction between > piece-on-light-square and piece-in-notation at the SVS level. > > Yes, we do, if we want the data to be well formed. > > > Currently, chess fonts can be (roughly) divided into "diagram fonts" and > "notation fonts?. > > That?s not true. There are some which do all three. There are, sure. I said roughly: many don't do both & rely on font-switching. > > > > A diagram font: > > - Is fixed-width (at least for the chess figurines themselves) > > - Centers each figurine in the character cell > > - Has a means of producing dark squares and on-dark-square equivalents > of the figurines, either through separate allocation or a "combining dark > square background" mechanism (usually a negative kerning hack) > > - Usually has board border elements, and may have decimal digits and a > subset of the lowercase Basic Latin alphabet for labeling ranks and files > > Gods, no. I was hoping to avoid the digits and letters for now. We mustn?t > scare them. > Well, they're just letters and digits. They aren't treated specially. > > > None of the features required for a diagram font are unacceptable in > figurine notation: > > The white ones may be too wide for use in text. > Not visually ideal, but legible. > > > In addition, when figurines for notation and for diagrams are > distinguished, they are distinguished above the character level, in runs of > like type: rows of a diagram, or lines of figurine notation. > > No, that?s spelling. > > Michael > > Sigh. -------------- next part -------------- An HTML attachment was scrubbed... URL: From otto.stolz at uni-konstanz.de Wed Apr 5 09:56:10 2017 From: otto.stolz at uni-konstanz.de (Otto Stolz) Date: Wed, 5 Apr 2017 16:56:10 +0200 Subject: Encoding of old compatibility characters In-Reply-To: <3cf59c63-ee7e-a805-d8d3-84b1597b20e7@ix.netcom.com> References: <92ba6970-86e1-5d80-e3c9-239283a384b0@gmail.com> <41b2170a-6efb-518d-8c02-3881fbb09bae@kli.org> <2ba990ce-9d57-4e8b-b4dd-e9f1a821cd3b@gmail.com> <4q7f39oed2.fsf@chem.ox.ac.uk> <2d2b2a87-f4d8-7f28-59de-f6cf7437c9c5@ix.netcom.com> <7e7af7d6-dfc4-159a-832f-e60f24136b0f@gmail.com> <83fuht6fqg.fsf@gnu.org> <838tngp6cm.fsf@gnu.org> <3cf59c63-ee7e-a805-d8d3-84b1597b20e7@ix.netcom.com> Message-ID: <6152360a-6df1-d8b2-786a-aa54c7a843f4@uni-konstanz.de> Helo, Am 31.03.2017 um 09:57 schrieb Eli Zaretskii: > Arial Unicode MS supports that character [U+23E8], FWIW. From: Otto Stolz Date: Tue, 4 Apr 2017 15:21:02 +0200 > Not on my good ole Wndows XP SP3 system. On 4/4/2017 7:58 AM, Eli Zaretskii wrote: > This here is also XP SP3. Maybe some package I have installed updated > the font? Am 04.04.2017 um 18:51 schrieb Asmus Freytag: > AFAIK, this font is / was installed by MS Office. I have got MS Word 2002 and MS Excel 2000. Maybe, later versions bring an amended version of Arial Unicode MS. Cheers, otto From everson at evertype.com Wed Apr 5 10:21:30 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 16:21:30 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> <175BB7CB-08BA-4E80-8337-7EBCCB90B141@evertype.com> Message-ID: <7E2DEDE2-A1D1-4937-85F6-DD0AB6432431@evertype.com> On 5 Apr 2017, at 15:52, Garth Wallace wrote: > [?] I'm just saying that if having symbols without VS not match either of the VSes is a sticking point, it's not hard to work around. Oh, I see. ?? Well, yes, I agree with you in part. But here?s the thing. It is *permissible* for proportional-inline-chesspieces to be identical to emsquare-chessboard-chesspiece if a designer *wants* to do it that way. But it is *just* as permissible for proportional-inline-chesspieces to be truly proportional and unsuitable for chessboard typesetting (and that?s how it has been since Unicode 1.1). Look, here is a choice: U+2654 - WHITE CHESS KING whose width might or might not be U+2654 FE00 - WHITE CHESS KING whose glyph is a white/light em-square for chessboards U+2654 FE01 - WHITE CHESS KING whose glyph is a black/dark em-square for chessboards I think this is enough. Or it could be: U+2654 - WHITE CHESS KING whose width might or might not be U+2654 FE00 - WHITE CHESS KING whose glyph is the same as the unmodified U+2654, whatever it is U+2654 FE01 - WHITE CHESS KING whose glyph is a white/light em-square for chessboards U+2654 FE02 - WHITE CHESS KING whose glyph is a black/dark em-square for chessboards There?s some precedent for this, where some symbols have one VS for ?text glyph? and a different VS for ?emoji glyph? and of course the unmodified symbol can be used and will display as the font has it. I don?t think the second is necessary. It?s not necessary for this, for example: U+0030 - DIGIT ZERO U+0030 FE00 - short diagonal stroke form U+0030 FE0E - text style U+0030 FE0F - emoji style OK, ?text style? is identical to unmodified U+0030, but the only reason that attribute exists is in distinction to ?emoji style?. Compare also: U+1000 - MYANMAR LETTER KA U+1000 FE00 - dotted form >>> Currently, chess fonts can be (roughly) divided into "diagram fonts" and "notation fonts?. >> >> That?s not true. There are some which do all three. > > There are, sure. I said roughly: many don't do both & rely on font-switching. But even more of them can?t rely on font-switching because the encoding of the piece on light and dark chessboard varies from supplier to supplier. All current chess fonts are ASCII hacks. >>> None of the features required for a diagram font are unacceptable in figurine notation: >> >> The white ones may be too wide for use in text. > > Not visually ideal, but legible. Yes but if we were to unify unmodified chesspieces with the pieces on white squares it could invalidate the metrics of text like http://evertype.com/standards/unicode-list/34-variantim.png As I say, it?s *permissible* to have the unmodified chesspiece glyph be the same as the white-square chesspiece glyph, but it?s not obligatory, and we must preserve font designer choice here. Michael Everson From asmusf at ix.netcom.com Wed Apr 5 10:21:51 2017 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Wed, 5 Apr 2017 08:21:51 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> Message-ID: On 4/5/2017 5:22 AM, William_J_G Overington wrote: > Asmus Freytag wrote: > >> .... - relying solely on ligatures has the benefit of not involving the UTC at all, therefore it could be implemented today without delay). > I am wondering whether that is correct. > > Where one implements a ligature using a ZWJ without the Unicode Technical Committee having agreed then that is fine where the meaning of the text is unchanged: for example, if one chooses to include, say, a pp ligature in a font. > > Yet to implement a ligature using a ZWJ where the meaning is changed, then I am wondering whether that needs the agreement of the Unicode Technical Committee. > > There have been some recent encodings where ZWJ has been used with two or more emoji characters to produce a new emoji character where the meaning of the result is different from the combined meanings of the ingredients, the meaning of that new character not always or maybe never being congruently obvious unless one already knows the meaning. > > If a ZWJ encoding for producing chess diagrams were to be introduced, then if it is not UTC that decides the detail, then who does decide? Would a non-UTC decision be interoperable, would it be supported? There's no need to use a ZWJ, because there's no existing other use of a square before a chess piece that needs to be preserved. A./ PS: I assume it's safe to ignore the rest of your message, being based on a wrong premise? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 5 10:25:33 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 08:25:33 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> Message-ID: <57079982-d48c-69aa-6195-20fe08b332e3@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 5 10:26:09 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 08:26:09 -0700 Subject: Encoding of old compatibility characters In-Reply-To: <6152360a-6df1-d8b2-786a-aa54c7a843f4@uni-konstanz.de> References: <92ba6970-86e1-5d80-e3c9-239283a384b0@gmail.com> <41b2170a-6efb-518d-8c02-3881fbb09bae@kli.org> <2ba990ce-9d57-4e8b-b4dd-e9f1a821cd3b@gmail.com> <4q7f39oed2.fsf@chem.ox.ac.uk> <2d2b2a87-f4d8-7f28-59de-f6cf7437c9c5@ix.netcom.com> <7e7af7d6-dfc4-159a-832f-e60f24136b0f@gmail.com> <83fuht6fqg.fsf@gnu.org> <838tngp6cm.fsf@gnu.org> <3cf59c63-ee7e-a805-d8d3-84b1597b20e7@ix.netcom.com> <6152360a-6df1-d8b2-786a-aa54c7a843f4@uni-konstanz.de> Message-ID: <2e0e609c-5516-f055-cd85-05c5fbe65963@ix.netcom.com> > I have got MS Word 2002 and MS Excel 2000. > Maybe, later versions bring an amended version of Arial Unicode MS. Maybe. A./ > > > From everson at evertype.com Wed Apr 5 10:37:38 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 16:37:38 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <57079982-d48c-69aa-6195-20fe08b332e3@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <57079982-d48c-69aa-6195-20fe08b332e3@ix.netcom.com> Message-ID: <9525A452-37B8-4190-86B7-604FDCCC3E3A@evertype.com> > On 5 Apr 2017, at 16:25, Asmus Freytag wrote: > >> http://evertype.com/standards/unicode-list/looking-glass-yellow-blue.png >> > This matches the reply I gave Richard. Very nice. 15 seconds? work, too. > I think you could achieve the same with using just ligatures (no VS) and get the same result when using a proper font. No, because yours isn?t as well thought-out in terms of the structure of plaintext chessboard data. (Probably only because I?ve been working on this with real fonts for a good while now.) See my next e-mail. Michael Everson From everson at evertype.com Wed Apr 5 11:02:44 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 17:02:44 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> Message-ID: <762B29A1-3161-41DD-82C6-3CB07B86EC91@evertype.com> On 5 Apr 2017, at 11:05, Asmus Freytag wrote: > Actually, I'm now leaning towards a preference for any scheme that does not use VS, but relies on ligatures. This would make editing the text more difficult and would yield less legible results in environments where the ligatures aren?t supported. > Such a scheme would need > a) no matching spacing for the bare pieces (the ligature with the empty square would result in the correct spacing) Well, that?s no different at all than my scheme except you ligate pawn and empty square as I ligate pawn and VS. But your scheme has the disadvantage of being similar to the emoji sequences, which would appear to require ZWJ between the pawn and the empty square. That means you have more characters to deal with and in fact you end up with variable length chessboard lines, which yields the worst possible results in fallback. > b) no pieces with built-in dark background (pieces simply ligate with the empty "black" square). Or as I have it, pawn and VS. >> Now, what happens to the two scheme if rendered with yellow text ('foreground') on a blue background? > > According to Michael, the effect should be that of lead typography. Well that?s not really what I was talking about with lead typography. (That?s more the ASCII-art argument.) > This would mean that the entire ligature has the same ink color, and all parts that are not "ink" are the background color (paper color). Yes, paper and ink. As in http://evertype.com/standards/unicode-list/looking-glass-yellow-blue.png > Unlike lead typography, the ink can be perfectly opaque, allowing a lighter color to show on a dark background. Or the opacity of the foreground can be selected to an intermediate level, allowing the ink to look greenish in your example. In any case this is a red herring. > (The results with a VS based system are not really different, because I imagine, the actual glyph repertoire is identical in all alternatives discussed so far - relying solely on ligatures has the benefit of not involving the UTC at all, therefore it could be implemented today without delay). Except that ligatures is problematic for actually making chessboards. The risk that fallback becomes illegible is hugely magnified. Here: http://evertype.com/standards/unicode-list/ligation-vs-VS.png On the left we have your scheme, shown in a mono-width font; on the right, mine. Ligation, in fallback will lead to variable-width text on each of the eight lines, which will differ depending on how many chess pieces or none appear. With the VS solution, *all* chess data will have the same number of characters in each line. In fact, parsers could identify misplaced VS characters (VS1 where VS2 would have to be there) or missing ones. Moreover, reverse-parsers (or whatever the term could be) could take narrative text data as in: http://evertype.com/standards/unicode-list/34-variantim.png and generate tables from it (if the narrative data were well-formed). All the UTC has to do is approve the set of VS sequences as a *standardized* way of doing this. Ad-hoc ligation is just going to lead to continued chaos, as well as continued dependence on differently-encoded ASCII fonts. Michael Everson From beckiergb at gmail.com Wed Apr 5 11:42:25 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Wed, 5 Apr 2017 09:42:25 -0700 Subject: PETSCII mapping? In-Reply-To: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: On Wed, Apr 5, 2017 at 3:18 AM, Asmus Freytag wrote: > Unicode is not an archive of anything ever used on computers. > Why not? Isn't one of Unicode's goals to support the conversion of documents using legacy character sets into Unicode? I do not understand why, say, the entire IBM PC character set is eligible for encoding, but not the entire Commodore 64 character set. Were there word processors on the Commodore 64 that allowed the input of PETSCII characters? Could documents written using that software demonstrate a need to encode those characters? What about instruction manuals, magazine articles, and program listings that used PETSCII characters in running text? Surely there must be more than enough examples for a computer as popular as the Commodore 64. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Wed Apr 5 11:28:04 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 5 Apr 2017 17:28:04 +0100 (BST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> Message-ID: <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> Asmus Freytag wrote: > There's no need to use a ZWJ, because there's no existing other use of a square before a chess piece that needs to be preserved. Well, whether there is a need to use a ZWJ or no need to use a ZWJ is not here the issue. Asmus wrote before: > > > .... - relying solely on ligatures has the benefit of not involving the UTC at all, therefore it could be implemented today without delay). I then asked, the question worded differently from how it is worded here, about whether UTC needs to be involved where a character sequence that contains one or more ZWJ characters generates a glyph with a meaning different from the meaning of the original sequence that did not have the one or more ZWJ characters included. For example, p ZWJ p produces a pp ligature with no change of meaning. For example, where WOMAN ZWJ ROCKET produces a glyph for a LADY ASTRONAUT, thus a change of meaning and I think that it went to UTC as there was a change of meaning but I am not congruently sure of that.. SQUARE ZWJ CHESSPIECE or CHESSPIECE ZWJ SQUARE produces a CHESSPIECE ON A SQUARE, thus a change of meaning. So the question is not about the chess encoding but about the original comment that claimed " - relying solely on ligatures has the benefit of not involving the UTC at all, therefore it could be implemented today without delay).". > PS: I assume it's safe to ignore the rest of your message, being based on a wrong premise? Well, not a wrong premise. Actually he rest of the post was about other aspects as well as that question, including some text about my experience with a metal chess fount and a puzzle that I hope that you will enjoy. William Overington Wednesday 5 April 2017 From everson at evertype.com Wed Apr 5 12:29:33 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 18:29:33 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> Message-ID: <0FA171D1-97FD-4628-AAF5-40351B6034A7@evertype.com> On 5 Apr 2017, at 17:28, William_J_G Overington wrote: > Well, whether there is a need to use a ZWJ or no need to use a ZWJ is not here the issue. There isn?t. We should use VS just as we do with maths and Myanmar characters. > I then asked, the question worded differently from how it is worded here, about whether UTC needs to be involved where a character sequence that contains one or more ZWJ characters generates a glyph with a meaning different from the meaning of the original sequence that did not have the one or more ZWJ characters included. The proposal has been made for Standardized Variation Sequences. > For example, p ZWJ p produces a pp ligature with no change of meaning. A ZWJ is not necessary to produce a pp ligature. > For example, where WOMAN ZWJ ROCKET produces a glyph for a LADY ASTRONAUT, thus a change of meaning and I think that it went to UTC as there was a change of meaning but I am not congruently sure of that.. That is a matter of emoji which is not ?normal? symbol usage and is not really analogous to what we are discussing here. > SQUARE ZWJ CHESSPIECE or CHESSPIECE ZWJ SQUARE produces a CHESSPIECE ON A SQUARE, thus a change of meaning. No, it?s not. CHESSPIECE is still CHESSPIECE. The glyph for CHESSPIECE needs to be altered in order to make it suitable to use the characters in a way which will permit the presentation and interchange of chessboard matrices. Michael Everson From verdy_p at wanadoo.fr Wed Apr 5 14:13:57 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 5 Apr 2017 21:13:57 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> Message-ID: 2017-04-05 18:28 GMT+02:00 William_J_G Overington : > For example, where WOMAN ZWJ ROCKET produces a glyph for a LADY ASTRONAUT, > thus a change of meaning and I think that it went to UTC as there was a > change of meaning but I am not congruently sure of that.. > > SQUARE ZWJ CHESSPIECE or CHESSPIECE ZWJ SQUARE produces a CHESSPIECE ON A > SQUARE, thus a change of meaning. > You're right here. The absence of ZWJ clearly means separate symbols side by side (wether they will align vertically or match their metrics is not relevant here but we already see that this is a problem for displaying actual boards with the "method" proposed by Micheal Everson for use in plain text, which just looks for me as only a hack (not a serious encoding proposal), just as if we were replacing all German sharp s letters by Greek beta letters, only because they more or less "look the same". You can perfectly have a board displayed beside normal text which may contain some chess pieces, not intended to combine with the surrounding board, even if both symbols may also appear side by side (with independant metrics) in text paragraphs. Given what has been encoded for other Emojis, ZWJ should be usd between symbols that are supposed to combine visually (such as MAN+WOMAN). The encoding should still respect the logic, just like we do in normal scripts (independantly of the fact they may have different visual ordering/layout, or could have similar glyphs properly disunified because of their needed distinct semantic properties). Note als othat these "chess pieces" are not just intended to be used only with chesses, and various board types may be used (not only with square cells, for example there are rectangular ones or triangular for Shogi pieces in Japan, the cell colors also have their own meanings, and special boards may have their own cells changing colors to add other rules). Note that Shogi has other pieces with distinct semantics. The pieces are generally flat and can be tuned to the other side to show their promotion. Traditional pieces use cursive Kanjis, but there are modernised **variants** using linear glyph shapes, or westernized shapes with Latin letters or geometric symbols, or even reusing the chess pieces (including the Queen for the Gold General; or the King for the Jewel/Jade General/Master and for its "White" Challenger), but making distinctions between horses (horses-dragoons) and cavalry. When promoting using chess pieces, the promotion may be shown by placing the chess piece.on top of a draught piece or coin/token. Coins/tokens are used to promote pawns (just stack two pieces like in draught game). -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Wed Apr 5 14:32:44 2017 From: everson at evertype.com (Michael Everson) Date: Wed, 5 Apr 2017 20:32:44 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> Message-ID: <5F9BCF6C-D351-4DFF-A972-2B251B4282CF@evertype.com> It?s wonderful that Mr Verdy opposes my proposal. I must be doing something right. On 5 Apr 2017, at 20:13, Philippe Verdy wrote: > 2017-04-05 18:28 GMT+02:00 William_J_G Overington : > For example, where WOMAN ZWJ ROCKET produces a glyph for a LADY ASTRONAUT, thus a change of meaning and I think that it went to UTC as there was a change of meaning but I am not congruently sure of that.. > > SQUARE ZWJ CHESSPIECE or CHESSPIECE ZWJ SQUARE produces a CHESSPIECE ON A SQUARE, thus a change of meaning. > > You're right here. The absence of ZWJ clearly means separate symbols side by side Wrong. ZWJ has no particular directional semantics. > (wether they will align vertically or match their metrics is not relevant here but we already see that this is a problem for displaying actual boards with the "method" proposed by Micheal Everson for use in plain text, I have no trouble whatsoever making use of the three prototype fonts which make use of variation selectors to set chessboards of various sizes and with pieces anywhere I need them to be. The proposal document clearly shows examples of the boards, set with the fonts using the substitutions I specify. What, then, is the problem for display? > which just looks for me as only a hack (not a serious encoding proposal), It is quite serious. It solves a long-standing problem which everyone has ignored. > just as if we were replacing all German sharp s letters by Greek beta letters, only because they more or less "look the same?. Lovely! A completely random analogy that has nothing whatsoever to do with this proposal. > You can perfectly have a board displayed beside normal text which may contain some chess pieces, not intended to combine with the surrounding board, even if both symbols may also appear side by side (with independant metrics) in text paragraphs. Yes, Mr Verdy. That?s just exactly what my proposal says. You can use one font, with some extra glyphs attained by use of VS, to set chesspieces in text and to set chessboards alongside them. All using Unicode characters, not competing ASCII encodings which prevent harmonization of chessboard data now. There?s even an example of this in my proposal. Perhaps you didn?t read it. Can you find the Figure I refer to? > Given what has been encoded for other Emojis, ZWJ should be usd between symbols that are supposed to combine visually (such as MAN+WOMAN). Chess characters aren?t emojis. > The encoding should still respect the logic, The logic of the use of VS in this proposal is no different from the logic used with them in maths, or in Myanmar, or even in some emoji. > just like we do in normal scripts (independantly of the fact they may have different visual ordering/layout, or could have similar glyphs properly disunified because of their needed distinct semantic properties). A pawn is a pawn is a pawn. Sometimes I need the glyph for a pawn to appear in a certain way in order to do something nice like set a chessboard. > Note als othat these "chess pieces" are not just intended to be used only with chesses, If there are other uses which can be made of chess pieces, then those uses can be investigated in due course by someone interested in that. > and various board types may be used (not only with square cells, for example there are rectangular ones or triangular for Shogi pieces in Japan, Shogi is not chess. Shogi notation is not like chess notation, either. Try to focus on the actual proposal. > the cell colors also have their own meanings, and special boards may have their own cells changing colors to add other rules). Red herring. This has nothing to do with the PRIMARY USE of chess characters, which is inline in text to describe chess problems in various notations, and also to set chessboard diagrams. > Note that Shogi has other pieces with distinct semantics. Shogi isn?t chess. > The pieces are generally flat and can be tuned to the other side to show their promotion. Traditional pieces use cursive Kanjis, but there are modernised **variants** using linear glyph shapes, or westernized shapes with Latin letters or geometric symbols, or even reusing the chess pieces (including the Queen for the Gold General; or the King for the Jewel/Jade General/Master and for its "White" Challenger), but making distinctions between horses (horses-dragoons) and cavalry. When promoting using chess pieces, the promotion may be shown by placing the chess piece.on top of a draught piece or coin/token. Coins/tokens are used to promote pawns (just stack two pieces like in draught game). Shogi isn?t chess. I thank Mr Verdy for his defence of my proposal. Michael Everson From wjgo_10009 at btinternet.com Wed Apr 5 14:45:57 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 5 Apr 2017 20:45:57 +0100 (BST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <11942252.59673.1491420041683.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> References: <11942252.59673.1491420041683.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> Message-ID: <19941195.63023.1491421557775.JavaMail.defaultUser@defaultHost> >> As it happens, Quest text also has eight glyphs for producing a border, all eight being in the Private Use Area. They are rather ornate. They are at U+E5B0 through to U+E5B7. Michael Everson wrote: > They are there. I had to figure out how the should be used. They are put together in a very different way than the borders of any other font I have seen are. I am not sure, but I think he?s intended to use them thus: > [Pic of the Looking-Glass board in William?s font] > http://evertype.com/standards/unicode-list/overington-board.png Yes, Michael has set out the border as I intended it to be used. Thank you. > William?s design is decidedly non-traditional, and not (to my eye) particularly easy to read, but it doesn?t matter. The picture here shows his glyphs configured in exactly the same way as specified in my proposal. IT WORKS. (There are some hairline gaps in the border and the top left corner piece is a little less well aligned than one would if one were preparing to ship the font.) Thank you for producing the picture. Yes, there are some hairline gaps in the border. It happens in some places when using the font in PagePlus X7 here: they appear to be rounding errors in the rendering system. Maybe I can try to make the glyphs for the two left side border corners and the upper and lower border horizontals each a bit wider than the advance width. Line spacing a little less than it should be for the font size in the application program might stop any vertical hairlines without altering the font, if indeed altering the font vertically would work anyway and I am unsure at present whether it would or not. However, the issue with the top left corner piece is not a font issue and that issue does not occur when using the font with PagePlus X7. If I do alter the font, or make a variant version, then I will need to check what happens if glyphs overlap when producing a PDF document before finalizing anything. > Thank you for sharing your font, William. I?ll send you the ttf of this one so you can tinker with glyph placement as you wish, if the proposal is accepted and the standardized variation sequences accepted. Thank you. William Overington Wednesday 5 April 2017 From verdy_p at wanadoo.fr Wed Apr 5 15:26:01 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 5 Apr 2017 22:26:01 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <5F9BCF6C-D351-4DFF-A972-2B251B4282CF@evertype.com> References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> <5F9BCF6C-D351-4DFF-A972-2B251B4282CF@evertype.com> Message-ID: 2017-04-05 21:32 GMT+02:00 Michael Everson : > It?s wonderful that Mr Verdy opposes my proposal. I must be doing > something right. > > On 5 Apr 2017, at 20:13, Philippe Verdy wrote: > > > 2017-04-05 18:28 GMT+02:00 William_J_G Overington < > wjgo_10009 at btinternet.com>: > > For example, where WOMAN ZWJ ROCKET produces a glyph for a LADY > ASTRONAUT, thus a change of meaning and I think that it went to UTC as > there was a change of meaning but I am not congruently sure of that.. > > > > SQUARE ZWJ CHESSPIECE or CHESSPIECE ZWJ SQUARE produces a CHESSPIECE ON > A SQUARE, thus a change of meaning. > > > > You're right here. The absence of ZWJ clearly means separate symbols > side by side > > Wrong. ZWJ has no particular directional semantics. > NO! I did nit give any direction. Direc tion is a separate issue (if you mean there the Bidi algorithm) "Side by side" does not prohibit ligatures but this is just like with letters "side by side" where ordering is defined independantly. So I maintain what I replied. The **absence** of ZWJ clearly means separate symbols side by side (minus typographic/styling effects such as joining and **partial** overlays or kerning: this excludes overlays and complex ligatures that are in Unicode treated with separate encodings; partial overlays include kerning, or simple syllabic composition in 2D layouts for Hangul, Kana/Romaji squares, but excludes complex compositions for Hanzi/Kanji which are encoded specifically). -------------- next part -------------- An HTML attachment was scrubbed... URL: From beckiergb at gmail.com Wed Apr 5 15:35:45 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Wed, 5 Apr 2017 13:35:45 -0700 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: You can find charts of complete PETSCII character sets here: http://www.kreativekorp.com/software/fonts/c64.shtml The missing characters are a handful of block elements: upper fractional blocks (Unicode only has lower), halves of MEDIUM SHADE, checkerboards and diagonals. I can put together a unified chart, with mappings to Unicode where they exist. In fact I think I'll do that. :) I'm all willing to help put together a proposal for encoding missing block element characters, but I would need other people to a) gather evidence of use in plain text and b) write up the proposal in Unicode's formal language since I've never proposed characters to Unicode before. (Additionally, I wonder if we could find evidence of the Apple II's or TRS-80's characters in use in plain text as well. Not necessarily saying those should be encoded as well, just that we should investigate.) -- Rebecca Bettencourt On Wed, Apr 5, 2017 at 12:47 PM, Murray Sargent < murrays at exchange.microsoft.com> wrote: > What PETSCII characters aren?t already in Unicode? A couple geometric > symbols? Looks mostly like a simple codepage translation. > > > > Murray > > > > *From:* Unicode [mailto:unicode-bounces at unicode.org] * On Behalf Of *Rebecca > Bettencourt > *Sent:* Wednesday, April 5, 2017 9:42 AM > *To:* Asmus Freytag > *Cc:* unicode > *Subject:* Re: PETSCII mapping? > > > > On Wed, Apr 5, 2017 at 3:18 AM, Asmus Freytag > wrote: > > Unicode is not an archive of anything ever used on computers. > > > > Why not? Isn't one of Unicode's goals to support the conversion of > documents using legacy character sets into Unicode? I do not understand > why, say, the entire IBM PC character set is eligible for encoding, but not > the entire Commodore 64 character set. > > > > > > Were there word processors on the Commodore 64 that allowed the input of > PETSCII characters? Could documents written using that software demonstrate > a need to encode those characters? What about instruction manuals, magazine > articles, and program listings that used PETSCII characters in running > text? Surely there must be more than enough examples for a computer as > popular as the Commodore 64. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.karlsson14 at telia.com Wed Apr 5 16:13:44 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Wed, 05 Apr 2017 23:13:44 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-05 16:48, skrev "Michael Everson" : Kent, I can?t read this in a plain-text e-mail. Well, it was SUPPOSED to be explicit HTML code in the email. It was NOT the intent that the given example was to be rendered directly in the email (even if you have HTML emails enabled). Further, I would write the code a bit differently, in order to easily be able to map your proposed encoding for (parts of) chessboards to HTML. But at this point I did not want to change the referenced example (written by someone posting to stackoverflow.com) in any significant way. So yes, if you want to see the result of the HTML code, paste the HTML code to a plan text editor, name the file you save it to "chess.html", and view that file in a browser. That display in turn may be cut and pasted to another document, depending on the capabilities of the app used to edit that other document. The paste may, admittedly result in an awful and uneditable result. I agree that the HTML code is a bit of a mouthful (and I would also do it a bit differently), and also has the problem mentioned in the previous paragraph). Which is why I support your proposal, but with these modifications: - with the extra requirement to have VSs also for the boarder line drawing characters (to make them fit for drawing chess board boarders, in a general purpose font), and - some bidi fix [preferably making the box/border drawing characters bidi "L", if possible; otherwise a caveat that if there is an expectation to paste in such a board into an RTL document, bidi controls need be used to LTR the board]). Nit: You sometimes seem to have made the line spacing slightly larger (like 2 points) larger than the character width. Should they not be exactly the same, to get the best (square) display of the chess boards? (Not that it is very visible, but a bit.) /Kent K PS I think the "ligatures" approach is a dead end. - As you mention, the fallback will have very different line lengths for the lines of a board display, and thus basically unreadable. - If ZWJ is not needed, one will need two *new* characters that (in some fonts) ligate with chess pieces. No existing character should ever ligate with chess pieces. - If ZWJ is needed, then one can use some existing characters as board squares. - In either case, it is not clear (or obvious) which should come first, a chess piece or a board square. There will surely be mistakes, giving them in the wrong order (not a problem in your proposal). - My personal guesstimate is that there will be much fewer fonts that would implement the ligation (if that approach was to be chosen), than would implement the VS approach you are suggesting. Thus I support your proposal, since that gives: - Good fallback (readable, though ugly). - Fairly good display when the VS sequences are interpreted (and the font is otherwise reasonable), and "good" context (line height setting, not too short lines so that auto line breaking is avoided, ...). - Easier to machine parse than the ligatures approach; and MUCH easier to parse than an HTML version. - Easy to convert to (say) HTML for even better display in (say) HTML pages (CAN look much better, and NO dependence on line height setting or line width setting (or bidi direction derivations), but just that the table (for the board) is reasonably done. Den 2017-04-05 16:48, skrev "Michael Everson" : > Kent, I can?t read this in a plain-text e-mail. I can?t paste it into an > ordinary word-processor like Word as in my previous response to Markus, or in > Pages (left) or LibreOffice (right) as shown here. (I simply pasted in the > text from Word to each of those. It?s odd to see that there is some variation > in display the text without selecting it and applying the correctly-configured > font to it, but when that?s done, the correct display is given (modulo some > leading issues which I didn?t focus on in either). > > The workaround you give is just that. It works. It?s not usefully portable or > user-friendly, and as higher-letter protocols go, it hasn?t swept away all > competition for presenting chessboards. People use ASCII or MS Symbol-based > fonts not even with any Unicode characters in them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From 637275 at gmail.com Wed Apr 5 16:25:52 2017 From: 637275 at gmail.com (Rebecca T) Date: Wed, 5 Apr 2017 17:25:52 -0400 Subject: PETSCII mapping? In-Reply-To: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: > If there's a credible need to convert files between Unicode-based systems and > those using PETSCII There is! It?s called ?sharing textual information? and it?s how our society functions. Can we afford to blithely abandon data from the best selling computer in history [1] because nobody cared to standardize its? > A similar scenario might exist if C64 emulators run on Unicode-based systesm > were a widespread phenomenon They do! Even last month, there was a PETSCII directory-art contest. [2] A bit off-topic, but: As time goes on, ?not in widespread use? will become a flimsier and flimsier argument against inclusion ? why isn?t there a larger community of PETSCII enthusaists? Partially because the only way to share PETSCII is through images! The consortium (passively or actively) prevents communication through exclusion and then uses the lack of communication as a justification against inclusion ? it?s a poor, tautological argument, and it won?t serve the consortium long-term. Simply put, we need new criteria for inclusion ? as the vast majority of the world?s systems (from written communication in text messages to the manuscripts of all new books) are already Unicode-based, we can no longer rely on a character?s existing presence outside of Unicode as a signal to warrent inclusion; we must weigh a character?s merits and usability on its own. (does it fill a gap in communication? Will it be used?) [1]: http://www.cnn.com/2011/TECH/gaming.gadgets/05/09/commodore.64.reborn/ [2]: http://csdb.dk/event/?id=2558 -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed Apr 5 16:48:17 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 5 Apr 2017 22:48:17 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> Message-ID: <20170405224817.4149f845@JRWUBU2> In topic 'Proposal to add standardized variation sequences for chess notation', on Wed, 5 Apr 2017 03:05:16 -0700 Asmus Freytag wrote: > On 4/5/2017 1:10 AM, Richard Wordingham wrote: > > A piece with a *white* background is different to a piece that is > > merely an outline, whether filled or not. > Unless you select an 'emoji_presentation' you do not get two-toned > glyphs, therefore "white" is always the same as "transparent". This > is true for anything in plain text, not just game pieces. Where does this come from? I tried to read it from UTS#51 'Unicode Emoji', which is not part of TUS, but I couldn't deduce that a font that enables U+10B99 PSALTER PAHLAVI SECTION MARK to have exactly two (as opposed to none or four) red dots is in breach of the guidelines therein. Are we really going to have to set up Psalter Pahlavi emoji? There's also some encoded Ethiopic punctuation that certainly used to have red dots. I think the emoji database has overlooked an entire script of emoji - the Egyptian hieroglyphs! Richard. From asmusf at ix.netcom.com Wed Apr 5 17:14:02 2017 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Wed, 5 Apr 2017 15:14:02 -0700 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: <14e28283-bc59-096b-5b1b-5a4124fb66c0@ix.netcom.com> On 4/5/2017 2:25 PM, Rebecca T wrote: > > If there's a credible need to convert files between Unicode-based > systems and > > those using PETSCII > > There is! It?s called ?sharing textual information? and it?s how our > society > functions. Can we afford to blithely abandon data from the best selling > computer in history [1] because nobody cared to standardize its? There's no need for inflammatory rhetoric. If you believe there is a credible need, then it should be easy to document that as part of a proposal. Nothing gets decided by the UTC unless there's a proposal on the table. A./ > > > A similar scenario might exist if C64 emulators run on Unicode-based > systesm > > were a widespread phenomenon > > They do! Even last month, there was a PETSCII directory-art contest. [2] > > A bit off-topic, but: > > As time goes on, ?not in widespread use? will become a flimsier and > flimsier > argument against inclusion ? why isn?t there a larger community of PETSCII > enthusaists? Partially because the only way to share PETSCII is > through images! > The consortium (passively or actively) prevents communication through > exclusion > and then uses the lack of communication as a justification against > inclusion ? > it?s a poor, tautological argument, and it won?t serve the consortium > long-term. > > Simply put, we need new criteria for inclusion ? as the vast majority > of the > world?s systems (from written communication in text messages to the > manuscripts > of all new books) are already Unicode-based, we can no longer rely on a > character?s existing presence outside of Unicode as a signal to warrent > inclusion; we must weigh a character?s merits and usability on its > own. (does > it fill a gap in communication? Will it be used?) > > [1]: > http://www.cnn.com/2011/TECH/gaming.gadgets/05/09/commodore.64.reborn/ > [2]: http://csdb.dk/event/?id=2558 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 5 17:16:43 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 15:16:43 -0700 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170405224817.4149f845@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> Message-ID: <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> Do you have any examples of plain text that is rendered with parts of characters having white (opaque) background? I'm not aware of any, A./ On 4/5/2017 2:48 PM, Richard Wordingham wrote: > In topic 'Proposal to add standardized variation sequences for chess > notation', on Wed, 5 Apr 2017 03:05:16 -0700 > Asmus Freytag wrote: > >> On 4/5/2017 1:10 AM, Richard Wordingham wrote: >>> A piece with a *white* background is different to a piece that is >>> merely an outline, whether filled or not. >> Unless you select an 'emoji_presentation' you do not get two-toned >> glyphs, therefore "white" is always the same as "transparent". This >> is true for anything in plain text, not just game pieces. > Where does this come from? I tried to read it from UTS#51 'Unicode > Emoji', which is not part of TUS, but I couldn't deduce that a font > that enables U+10B99 PSALTER PAHLAVI SECTION MARK to have exactly two > (as opposed to none or four) red dots is in breach of the guidelines > therein. Are we really going to have to set up Psalter Pahlavi emoji? > There's also some encoded Ethiopic punctuation that certainly used to > have red dots. > > I think the emoji database has overlooked an entire script of emoji - > the Egyptian hieroglyphs! > > Richard. > From jameskasskrv at gmail.com Wed Apr 5 18:12:16 2017 From: jameskasskrv at gmail.com (James Kass) Date: Wed, 5 Apr 2017 15:12:16 -0800 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: Kent Karlsson wrote, > - with the extra requirement to have VSs also for the boarder line > drawing characters (to make them fit for drawing chess board > boarders, in a general purpose font), and This doesn't seem necessary. A general purpose font modified to display the chess board in plain text in accordance with Michael Everson's proposal would be expected to use the same metrics as the box drawing glyphs for all of the VS-produced glyphs. A general purpose font *not* so modified would not be expected to display the chessboard in a perfect square, anyway. (Yet the display would still be legible.) Best regards, James Kass From everson at evertype.com Wed Apr 5 18:25:44 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 00:25:44 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <6490CD43-45AF-40C0-9AB4-A2F8937DFF4E@evertype.com> On 5 Apr 2017, at 22:13, Kent Karlsson wrote: > > Kent, I can?t read this in a plain-text e-mail. > > Well, it was SUPPOSED to be explicit HTML code in the email. It was NOT the intent that the given example was to be > rendered directly in the email (even if you have HTML emails enabled). Oh, you misunderstood me. I knew it was raw HTML. I didn?t expect it to render. But it was meaningless code. The proposal for standardized variation sequences for chess treats it as text. Whether that text is analogous to ASCII art or not is irrelevant. The proposal solves a problem. giving good visual fallback, and excellent rendering if properly employed. It?s incredibly simple and uses > I agree that the HTML code is a bit of a mouthful (and I would also do it a bit differently), and also has the problem > mentioned in the previous paragraph). Which is why I support your proposal, but with these modifications: > > - with the extra requirement to have VSs also for the boarder line drawing characters (to make them fit for > drawing chess board boarders, in a general purpose font), and Look, if that?s the price I?d have to pay to move forward with this I would. I don?t think it?s necessary. I *do* think that the definition of the resulting glyph ?suitable for the chess glyphs in this font that supports ? Oh, here. This is what I would add. 2581 FE00; Chessboard box drawing; # LOWER ONE EIGHTH BLOCK 258F FE00; Chessboard box drawing; # LEFT ONE EIGHTH BLOCK 2594 FE00; Chessboard box drawing; # UPPER ONE EIGHTH BLOCK 2595 FE00; Chessboard box drawing; # RIGHT ONE EIGHTH BLOCK 2596 FE00; Chessboard box drawing; # QUADRANT LOWER LEFT 2597 FE00; Chessboard box drawing; # QUADRANT LOWER RIGHT 2598 FE00; Chessboard box drawing; # QUADRANT UPPER LEFT 259D FE00; Chessboard box drawing; # QUADRANT UPPER RIGHT I guess I see your point. It does no harm, especially if the font might possibly be used for graphics terminal emulation. ;-) > - some bidi fix [preferably making the box/border drawing characters bidi "L", if possible; otherwise a caveat that > if there is an expectation to paste in such a board into an RTL document, bidi controls need be used to LTR the board]). I don?t know if there is a problem here and am not able to offer a solution if there is. I don?t object to a solution, if there is a problem. > Nit: You sometimes seem to have made the line spacing slightly larger (like 2 points) larger than the character width. Different fonts have different metrics. The Ludus font supports many games, not just chess. > Should they not be exactly the same, to get the best (square) display of the chess boards? (Not that it is very visible, > but a bit.) I didn?t overcompensate in the proposal document to make absolutely perfect charts; it?s reasonable to know that from font to font control over leading may be necessary. > I think the "ligatures" approach is a dead end. I hope others will think so too. [1] > - As you mention, the fallback will have very different line lengths for the lines of a board display, > and thus basically unreadable. Since the proposal takes as read that chess data should be parseable and plain-text, an approach with better legibility should be considered superior to an approach with poorer legibility. [2] > - If ZWJ is not needed, one will need two *new* characters that (in some fonts) ligate with chess pieces. > No existing character should ever ligate with chess pieces. I?d agree, for even if there were ?ligate with light/dark chess square?, fallback would be illegible per [1] above. [3] > - If ZWJ is needed, then one can use some existing characters as board squares. Not sure what you mean, but it?s probably not important since ZWJ is a bad idea and because of [1] above. [4] > - In either case, it is not clear (or obvious) which should come first, a chess piece or a board square. > There will surely be mistakes, giving them in the wrong order (not a problem in your proposal). The one thing about my proposal is that a parser could tell someone if there were a missing VS or the wrong VS, though when you are typesetting with a conformant font, the visual feedback is enough. [5] > - My personal guesstimate is that there will be much fewer fonts that would implement the ligation > (if that approach was to be chosen), than would implement the VS approach you are suggesting. And THAT's the reason it has been proposed as a standardized sequence. Chess is an important activity and chess literature is vast and should be properly supported by the UCS. > Thus I support your proposal, since that gives: > - Good fallback (readable, though ugly). > - Fairly good display when the VS sequences are interpreted (and the font is otherwise reasonable), > and "good" context (line height setting, not too short lines so that auto line breaking is avoided, ...). > - Easier to machine parse than the ligatures approach; and MUCH easier to parse than an HTML version. > - Easy to convert to (say) HTML for even better display in (say) HTML pages (CAN look much better, > and NO dependence on line height setting or line width setting (or bidi direction derivations), but > just that the table (for the board) is reasonably done. Thank you for your consideration of the proposal, and of your support. Michael From jameskasskrv at gmail.com Wed Apr 5 18:49:32 2017 From: jameskasskrv at gmail.com (James Kass) Date: Wed, 5 Apr 2017 15:49:32 -0800 Subject: PETSCII mapping? In-Reply-To: <14e28283-bc59-096b-5b1b-5a4124fb66c0@ix.netcom.com> References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> <14e28283-bc59-096b-5b1b-5a4124fb66c0@ix.netcom.com> Message-ID: Asmus Freytag wrote, > There's no need for inflammatory rhetoric. Indeed not. How fortunate we are that nobody has posted any. Best regards, James Kass From everson at evertype.com Wed Apr 5 19:11:09 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 01:11:09 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170405224817.4149f845@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> Message-ID: <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> On 5 Apr 2017, at 22:48, Richard Wordingham wrote: > I tried to read it from UTS#51 ?Unicode Emoji', which is not part of TUS, but I couldn't deduce that a font that enables U+10B99 PSALTER PAHLAVI SECTION MARK to have exactly two (as opposed to none or four) red dots is in breach of the guidelines therein. Kindly explain how ANY font could do this. > Are we really going to have to set up Psalter Pahlavi emoji? There's also some encoded Ethiopic punctuation that certainly used to have red dots. If you want 10B99 to have different coloured dots (the rings? the dots?) the only precedent we have in the UCS is (1) to name a whole glyph with a colour like RED APPLE and then to hatch the glyph in black and white or (2) use the emoji property. > I think the emoji database has overlooked an entire script of emoji - the Egyptian hieroglyphs! Put it out of your mind. Michael Everson From everson at evertype.com Wed Apr 5 19:13:46 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 01:13:46 +0100 Subject: PETSCII mapping? In-Reply-To: <14e28283-bc59-096b-5b1b-5a4124fb66c0@ix.netcom.com> References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> <14e28283-bc59-096b-5b1b-5a4124fb66c0@ix.netcom.com> Message-ID: <74CFAEB0-E41D-467C-B48C-E1CC5D6D2A99@evertype.com> I agree with Rebecca. It?s going to be a handful of characters, used by the handful of people who use legacy character sets. Those people exist (I run Mac OS 9 regularly because it?s necessary for some of my work) and since some of these legacy characters are encoded, it makes sense to make sure all of them are. It?s no harm to the standard to support them. Asmus is right. It needs a proposal. > On 5 Apr 2017, at 23:14, Asmus Freytag (c) wrote: > > On 4/5/2017 2:25 PM, Rebecca T wrote: >> > If there's a credible need to convert files between Unicode-based systems and >> > those using PETSCII >> >> There is! It?s called ?sharing textual information? and it?s how our society >> functions. Can we afford to blithely abandon data from the best selling >> computer in history [1] because nobody cared to standardize its? > > There's no need for inflammatory rhetoric. > > If you believe there is a credible need, then it should be easy to document that as part of a proposal. > > Nothing gets decided by the UTC unless there's a proposal on the table. From everson at evertype.com Wed Apr 5 19:14:59 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 01:14:59 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> Message-ID: <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> > On 5 Apr 2017, at 23:16, Asmus Freytag wrote: > > Do you have any examples of plain text that is rendered with parts of characters having white (opaque) background? > > I'm not aware of any There are certainly MSS (in many languages) where some punctuation made of dots have some of the dots red and some black. Michael Everson From everson at evertype.com Wed Apr 5 19:19:02 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 01:19:02 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <0A60DF73-289F-40E6-A232-C4CB2593B794@evertype.com> On 6 Apr 2017, at 00:12, James Kass wrote: > > Kent Karlsson wrote, > >> - with the extra requirement to have VSs also for the boarder line drawing characters (to make them fit for drawing chess board boarders, in a general purpose font), and > > This doesn't seem necessary. A general purpose font modified to display the chess board in plain text in accordance with Michael Everson's proposal would be expected to use the same metrics as the box drawing glyphs for all of the VS-produced glyphs. A general purpose font *not* so modified would not be expected to display the chessboard in a perfect square, anyway. (Yet the display would still be legible.) Well. 1) A general purpose font that wanted to support chessboards as well as legacy graphic terminals would make use of VS for the border characters in order to be able to do both. 2) If we decided to standardizing on that would have to burden chess-font designers with either a) learning how to draw graphic terminal characters correctly in their chess fonts along with the characters + VS for actual use b) ignoring graphic terminal character shapes and just pasting in the chess shapes to those code positions Michael Everson From asmusf at ix.netcom.com Wed Apr 5 19:29:57 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 17:29:57 -0700 Subject: Coloured Punctuation and Annotation In-Reply-To: <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> Message-ID: <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> On 4/5/2017 5:14 PM, Michael Everson wrote: >> On 5 Apr 2017, at 23:16, Asmus Freytag wrote: >> >> Do you have any examples of plain text that is rendered with parts of characters having white (opaque) background? >> >> I'm not aware of any > There are certainly MSS (in many languages) where some punctuation made of dots have some of the dots red and some black. Agreed, those would be a challenge to reproduce with standard font technology and in plain text. But for the same reason, they are out of scope for plain text (and therefore a bit irrelevant to the current discussion). A./ From kent.karlsson14 at telia.com Wed Apr 5 19:31:39 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Thu, 06 Apr 2017 02:31:39 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <6490CD43-45AF-40C0-9AB4-A2F8937DFF4E@evertype.com> Message-ID: Exactly. /K Den 2017-04-06 01:25, skrev "Michael Everson" : > 2581 FE00; Chessboard box drawing; # LOWER ONE EIGHTH BLOCK > 258F FE00; Chessboard box drawing; # LEFT ONE EIGHTH BLOCK > 2594 FE00; Chessboard box drawing; # UPPER ONE EIGHTH BLOCK > 2595 FE00; Chessboard box drawing; # RIGHT ONE EIGHTH BLOCK > 2596 FE00; Chessboard box drawing; # QUADRANT LOWER LEFT > 2597 FE00; Chessboard box drawing; # QUADRANT LOWER RIGHT > 2598 FE00; Chessboard box drawing; # QUADRANT UPPER LEFT > 259D FE00; Chessboard box drawing; # QUADRANT UPPER RIGHT > > I guess I see your point. It does no harm, especially if the font might > possibly be used for graphics terminal emulation. ;-) From asmusf at ix.netcom.com Wed Apr 5 19:32:32 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 5 Apr 2017 17:32:32 -0700 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> <14e28283-bc59-096b-5b1b-5a4124fb66c0@ix.netcom.com> Message-ID: <0a37b5b6-98fc-2fb5-6489-4785140f7fa3@ix.netcom.com> On 4/5/2017 4:49 PM, James Kass wrote: > Asmus Freytag wrote, > >> There's no need for inflammatory rhetoric. > Indeed not. How fortunate we are that nobody has posted any. Indeed. Grabbed the wrong item from my word bin today. A./ > > Best regards, > > James Kass > From everson at evertype.com Wed Apr 5 19:47:41 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 01:47:41 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: Well, see my follow-up to James Kass and evaluate the merits of the two choices. Do generic font makers intend to support both graphic terminal emulation and chess? Should chess font makers be burdened with graphic terminal emulation glyphs they know nothing about? > On 6 Apr 2017, at 01:31, Kent Karlsson wrote: > > > Exactly. > > /K > > Den 2017-04-06 01:25, skrev "Michael Everson" : > >> 2581 FE00; Chessboard box drawing; # LOWER ONE EIGHTH BLOCK >> 258F FE00; Chessboard box drawing; # LEFT ONE EIGHTH BLOCK >> 2594 FE00; Chessboard box drawing; # UPPER ONE EIGHTH BLOCK >> 2595 FE00; Chessboard box drawing; # RIGHT ONE EIGHTH BLOCK >> 2596 FE00; Chessboard box drawing; # QUADRANT LOWER LEFT >> 2597 FE00; Chessboard box drawing; # QUADRANT LOWER RIGHT >> 2598 FE00; Chessboard box drawing; # QUADRANT UPPER LEFT >> 259D FE00; Chessboard box drawing; # QUADRANT UPPER RIGHT >> >> I guess I see your point. It does no harm, especially if the font might >> possibly be used for graphics terminal emulation. ;-) > > From kent.karlsson14 at telia.com Wed Apr 5 19:53:36 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Thu, 06 Apr 2017 02:53:36 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <6490CD43-45AF-40C0-9AB4-A2F8937DFF4E@evertype.com> Message-ID: Den 2017-04-06 01:25, skrev "Michael Everson" : > Oh, you misunderstood me. I knew it was raw HTML. I didn?t expect it to > render. But it was meaningless code. It was a response to Marcus, in that HTML might be used (with existing characters and no VSs) to format chess boards. And he is right, as proven by the HTML code I (basically) copied from stackoverflow. And it does typeset better plain text chess boards ? la your proposal... /Kent K From everson at evertype.com Wed Apr 5 19:56:38 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 01:56:38 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: On 6 Apr 2017, at 01:53, Kent Karlsson wrote: > >> Oh, you misunderstood me. I knew it was raw HTML. I didn?t expect it to render. But it was meaningless code. > > It was a response to Marcus, in that HTML might be used (with existing characters and no VSs) to format chess boards. And he is right, as proven by the HTML code I (basically) copied from stackoverflow. Yes, I know this of course. (Well, whatever stackoverflow is.) > And it does typeset better plain text chess boards ? la your proposal? Not with ordinary fonts and Unicode characters. And typographic care. From kent.karlsson14 at telia.com Wed Apr 5 19:54:11 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Thu, 06 Apr 2017 02:54:11 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <6490CD43-45AF-40C0-9AB4-A2F8937DFF4E@evertype.com> Message-ID: Den 2017-04-06 01:25, skrev "Michael Everson" : >> - some bidi fix [preferably making the box/border drawing characters bidi >> "L", if possible; otherwise a caveat that >> if there is an expectation to paste in such a board into an RTL document, >> bidi controls need be used to LTR the board]). > > I don?t know if there is a problem here and am not able to offer a solution if > there is. I don?t object to a solution, if there is a problem. I would think that anyone pasting a chess board (? la your proposal) to an RTL context will see that something went amiss, and also know enough about bidi to set the bidi context to LTR for the chess board(s), either by some setting, or by inserting bidi control characters. So a small caveat is all that is necessary. Like: "The chess boards are assumed to be set in a left-to-right bidi context." /Kent K From kent.karlsson14 at telia.com Wed Apr 5 20:05:25 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Thu, 06 Apr 2017 03:05:25 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-06 02:47, skrev "Michael Everson" : > Well, see my follow-up to James Kass and evaluate the merits of the two > choices. > Do generic font makers intend to support both graphic terminal > emulation and chess? I don't know. But it should not be impossible to do so. > Should chess font makers be burdened with graphic > terminal emulation glyphs they know nothing about? If it is really a chess font, they can just use the glyphs for the chess variety also as the "plain" (terminal emulator variety), and it would not matter (as long as no-one insist on using it for terminal emulation). All that is needed for that is a manoeuvre to copy a few glyphs within the font (when creating the font). I guess that is not very hard... /Kent K >> On 6 Apr 2017, at 01:31, Kent Karlsson wrote: >> >> >> Exactly. >> >> /K >> >> Den 2017-04-06 01:25, skrev "Michael Everson" : >> >>> 2581 FE00; Chessboard box drawing; # LOWER ONE EIGHTH BLOCK >>> 258F FE00; Chessboard box drawing; # LEFT ONE EIGHTH BLOCK >>> 2594 FE00; Chessboard box drawing; # UPPER ONE EIGHTH BLOCK >>> 2595 FE00; Chessboard box drawing; # RIGHT ONE EIGHTH BLOCK >>> 2596 FE00; Chessboard box drawing; # QUADRANT LOWER LEFT >>> 2597 FE00; Chessboard box drawing; # QUADRANT LOWER RIGHT >>> 2598 FE00; Chessboard box drawing; # QUADRANT UPPER LEFT >>> 259D FE00; Chessboard box drawing; # QUADRANT UPPER RIGHT >>> >>> I guess I see your point. It does no harm, especially if the font might >>> possibly be used for graphics terminal emulation. ;-) >> >> > > From everson at evertype.com Wed Apr 5 20:05:50 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 02:05:50 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <9C3A99D8-D873-41E6-8014-D163C4EF2597@evertype.com> On 6 Apr 2017, at 01:54, Kent Karlsson wrote: >>> - some bidi fix [preferably making the box/border drawing characters bidi "L", if possible; otherwise a caveat that if there is an expectation to paste in such a board into an RTL document, bidi controls need be used to LTR the board]). >> >> I don?t know if there is a problem here and am not able to offer a solution if there is. I don?t object to a solution, if there is a problem. > > I would think Come on. This is a serious proposal. I?m glad you support it, but if you are going to raise an issue like this, ?I would think and guess about a problem? isn?t the same as ?I have tried and here?s an actual problem?. Roozbeh, there?s an issue that might benefit from your expertise. Can you look into it? Discussion needn?t occur here, but offline with Kent and me, if you prefer. > that anyone pasting a chess board (? la your proposal) to an RTL context will see that something went amiss, Will they? Why? > and also know enough about bidi to set the bidi context to LTR for the chess board(s), RTL users understand the problems of cutting and pasting LTR text and symbols, certainly. LTR users don?t. > either by some setting, or by inserting bidi control characters. Well, if there?s a problem it should be well-defined so it can be tackled. > So a small caveat is all that is necessary. Like: "The chess boards are assumed to be set in a left-to-right bidi context.? THAT I can put into the document, but since chess is as important in both the RTL and LTR worlds, it would be good to know what?s what. Thank you again for your thoughtfulness, Michael From everson at evertype.com Wed Apr 5 20:08:38 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 02:08:38 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <03A47EDF-0C59-4845-A8B0-23D0A9739D15@evertype.com> On 6 Apr 2017, at 02:05, Kent Karlsson wrote: >> Do generic font makers intend to support both graphic terminal emulation and chess? > > I don't know. But it should not be impossible to do so. And you think the proposal as it does leads to that? >> Should chess font makers be burdened with graphic terminal emulation glyphs they know nothing about? > > If it is really a chess font, they can just use the glyphs for the chess variety also as the "plain" (terminal emulator variety), and it would not matter (as long as no-one insist on using it for terminal emulation). Ha, so you?re saying it?s mostly for things like Everson Mono that it matters? ;-) > All that is needed for that is a manoeuvre to copy a few glyphs within the font (when creating the font). I guess that is not very hard? It is not. Michael Everson From jameskasskrv at gmail.com Wed Apr 5 20:44:48 2017 From: jameskasskrv at gmail.com (James Kass) Date: Wed, 5 Apr 2017 17:44:48 -0800 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: Rebecca Bettencourt wrote, > I can put together a unified chart, with mappings to Unicode where > they exist. In fact I think I'll do that. :) I hope you do. That would be a good starting point. > I'm all willing to help put together a proposal for encoding missing > block element characters, but I would need other people to a) gather > evidence of use in plain text and b) write up the proposal in Unicode's > formal language since I've never proposed characters to Unicode before. Even the most prolific of our proposers had to start someplace... > As time goes on, ?not in widespread use? will become a flimsier > and flimsier argument against inclusion... Agreed. As arguments go, that one was never very robust. Best regards, James Kass From lokedhs at gmail.com Wed Apr 5 21:40:19 2017 From: lokedhs at gmail.com (=?UTF-8?Q?Elias_M=C3=A5rtenson?=) Date: Thu, 6 Apr 2017 10:40:19 +0800 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: On 6 April 2017 at 09:44, James Kass wrote: > Rebecca Bettencourt wrote, > > > I can put together a unified chart, with mappings to Unicode where > > they exist. In fact I think I'll do that. :) > > I hope you do. That would be a good starting point. > The Wikipedia page on PETSCII has a character map where the missing characters are highlighted. Based on my count, there are 31 missing symbols. Those should be reasonably simple to document and highlight. Do we also have to create an example font that includes these symbols? That seems to be what Michael Everson did for his chess notation proposal that I read recently. Then there is the issue of what to do with the text colour and style selectors. PETSCII has characters that indicate a colour change as well as reverse video. At least the reverse video one is important, as it's being used to construct new characters. For example, PETSCII only has a single character "half block" (top part filled). The way you represent a half block with the bottom part filled is to use the reverse video together with the former. It would probably make more sense to represent the reversed symbols as separate code points? Regards, Elias -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Wed Apr 5 22:01:23 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Apr 2017 05:01:23 +0200 (CEST) Subject: Emoji Compatibility Symbols In-Reply-To: References: Message-ID: <1947780010.43276.1491447683535.JavaMail.open-xchange@app06.ox.hosteurope.de> Charlotte Buff : > > That document was very helpful, but unfortunately many of the images are > missing. would fix that. From duerst at it.aoyama.ac.jp Wed Apr 5 22:24:21 2017 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Thu, 6 Apr 2017 12:24:21 +0900 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> Message-ID: <7d222120-a2a8-56f0-bc6e-348887b95d9f@it.aoyama.ac.jp> On 2017/04/05 23:49, Michael Everson wrote: > Oh, here is the answer to your question. It took me 15 seconds to change the background and text colour in Quark XPress. It has nothing to do with the proposal for variation sequences. > > http://evertype.com/standards/unicode-list/looking-glass-yellow-blue.png [OT] It looks neat. But I noticed three very small gaps in each of the top and bottom borders. Also, it's probably not the best choice of colors, because my eyes tend to associate the yellow figures with white, and the blue ones with black, but thinking it through makes it clear that it's the other way round. Regards, Martin. From beckiergb at gmail.com Wed Apr 5 22:32:49 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Wed, 5 Apr 2017 20:32:49 -0700 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: On 6 April 2017 at 09:44, James Kass wrote: > Rebecca Bettencourt wrote, > > > I can put together a unified chart, with mappings to Unicode where > > they exist. In fact I think I'll do that. :) > > I hope you do. That would be a good starting point. > I'm working on it! On Wed, Apr 5, 2017 at 7:40 PM, Elias M?rtenson wrote: > Do we also have to create an example font that includes these symbols? > That seems to be what Michael Everson did for his chess notation proposal > that I read recently. > We do have to provide Unicode with fonts, I believe. We can use an existing C64 font, such as Pet Me. Or, we can create a new font with vectorized versions of the characters. > Then there is the issue of what to do with the text colour and style > selectors. PETSCII has characters that indicate a colour change as well as > reverse video. At least the reverse video one is important, as it's being > used to construct new characters. For example, PETSCII only has a single > character "half block" (top part filled). The way you represent a half > block with the bottom part filled is to use the reverse video together with > the former. > > It would probably make more sense to represent the reversed symbols as > separate code points? > I would actually leave the color-change and reverse-video characters to a higher-level protocol. > > Regards, > Elias > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed Apr 5 23:41:07 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 6 Apr 2017 05:41:07 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> Message-ID: <20170406054107.20e40bd5@JRWUBU2> On Thu, 6 Apr 2017 01:11:09 +0100 Michael Everson wrote: > On 5 Apr 2017, at 22:48, Richard Wordingham > wrote: > > > I tried to read it from UTS#51 ?Unicode Emoji', which is not part > > of TUS, but I couldn't deduce that a font that enables U+10B99 > > PSALTER PAHLAVI SECTION MARK to have exactly two (as opposed to > > none or four) red dots is in breach of the guidelines therein. > > Kindly explain how ANY font could do this. Is this a trick question? The character consists of 4 dots arranged at the corners of a diamond. The top and bottom dots are traditionally red. From the proposal, they may be drawn as circles, but I'm not completely sure from the wording that this isn't a transcriptional convention. The left and right dots are in the colour of the accompanying letters. I haven't done it myself, so I can only give you my interpretation of the OpenType standard. An OpenType font not using SVG would use 3 outline glyph definitions - one for ultimate monochrome rendering, one for the red dots, and one for the black (or whatever) dots. The cmap table would map the character to the first glyph. The COLR table (the 'color table') https://www.microsoft.com/typography/otspec/colr.htm then maps the first glyph to a combination of the second and third glyphs. The second glyph would have its colour specified by an index into the currently selected colour palette. The third glyph would use 'index' 0xFFFF to specify that it be displayed in the foreground colour. The palette is defined in the CPAL table (the 'color palette table') https://www.microsoft.com/typography/otspec/cpal.htm . All palettes (there need only be one) should have a common index value designating red. And far as I can see, that's job done - a natural two-tone glyph. If one set the foreground colour to purple, then the top and bottom dots will be red and the left and right dots will be purple - exactly two red dots. Of course, if one selects red as the foreground colour, one will get four red dots, as with a monochrome font. > > Are we really going to have to set up Psalter Pahlavi emoji? > > There's also some encoded Ethiopic punctuation that certainly used > > to have red dots. > > If you want 10B99 to have different coloured dots (the rings? the > dots?) the only precedent we have in the UCS is (1) to name a whole > glyph with a colour like RED APPLE and then to hatch the glyph in > black and white or (2) use the emoji property. Chapter and verse for (2), please. I searched and couldn't find it. To be precise, what says that a 'text presentation' has to be monochrome? > > I think the emoji database has overlooked an entire script of emoji > > - the Egyptian hieroglyphs! > > Put it out of your mind. A prohibition on fonts delivering appropriately coloured hieroglyphs seems wrong. The hieroglyphs don't have the emoji property, so compliance to the standards promulgated by the UTC would rule such a font out if you and Asmus are correct. Richard. From 637275 at gmail.com Thu Apr 6 00:10:02 2017 From: 637275 at gmail.com (Rebecca T) Date: Thu, 6 Apr 2017 01:10:02 -0400 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: The Wikipedia page for PETSCII [1] only marks 20 characters as not having Unicode equivalents; 2px (light) and 3px (heavy) horizontal and vertical bars at various non-center positions, diagonal shading characters, and corner characters. I?ve done some processing to the table on [1] to filter out the missing characters ? their exact codepoints and descriptions can be found in [2]. These characters are highlighted in red in the attached image (green characters are also missing but are duplicates of other characters in the chart), and marked by U+FFFD ? in the compact table [3]. The box-drawing characters seem to semantically represent lines (boxes) and the block elements seem to represent shapes and shades; this makes $7c, $7f, $a7, $a8, $a9, $b6, $b7, and $b8 block elements and the rest box-drawing characters. [1]: https://en.m.wikipedia.org/wiki/PETSCII [2]: https://github.com/9999years/Unicode-PETSCII/blob/master/new.txt [3]: https://github.com/9999years/Unicode-PETSCII/blob/master/graphic-table.txt [image: Inline image 1] On Wed, Apr 5, 2017 at 11:32 PM, Rebecca Bettencourt wrote: > On 6 April 2017 at 09:44, James Kass wrote: > >> Rebecca Bettencourt wrote, >> >> > I can put together a unified chart, with mappings to Unicode where >> > they exist. In fact I think I'll do that. :) >> >> I hope you do. That would be a good starting point. >> > > I'm working on it! > > On Wed, Apr 5, 2017 at 7:40 PM, Elias M?rtenson wrote: > >> Do we also have to create an example font that includes these symbols? >> That seems to be what Michael Everson did for his chess notation proposal >> that I read recently. >> > > We do have to provide Unicode with fonts, I believe. We can use an > existing C64 font, such as Pet Me. Or, we can create a new font with > vectorized versions of the characters. > > >> Then there is the issue of what to do with the text colour and style >> selectors. PETSCII has characters that indicate a colour change as well as >> reverse video. At least the reverse video one is important, as it's being >> used to construct new characters. For example, PETSCII only has a single >> character "half block" (top part filled). The way you represent a half >> block with the bottom part filled is to use the reverse video together with >> the former. >> >> It would probably make more sense to represent the reversed symbols as >> separate code points? >> > > I would actually leave the color-change and reverse-video characters to a > higher-level protocol. > > >> >> Regards, >> Elias >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: new-characters.png Type: image/png Size: 21904 bytes Desc: not available URL: From irgendeinbenutzername at gmail.com Thu Apr 6 00:14:49 2017 From: irgendeinbenutzername at gmail.com (Charlotte Buff) Date: Thu, 6 Apr 2017 07:14:49 +0200 Subject: PETSCII mapping? Message-ID: Rebecca Bettencourt wrote: > I'm all willing to help put together a proposal for encoding missing block > element characters, but I would need other people to a) gather evidence of > use in plain text and b) write up the proposal in Unicode's formal language > since I've never proposed characters to Unicode before. I'm in the process of preparing a proposal for several old character sets, one of them PETSCII. At the moment I am still mostly concerned with analyzing the sets and determining which characters can sensibly be unified with existing ones and how to best structure the included repertoire. Currently I am quite busy with university stuff so things progress rather slowly, though. -------------- next part -------------- An HTML attachment was scrubbed... URL: From 637275 at gmail.com Thu Apr 6 00:19:42 2017 From: 637275 at gmail.com (Rebecca T) Date: Thu, 6 Apr 2017 01:19:42 -0400 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406054107.20e40bd5@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> Message-ID: > The hieroglyphs don't have the emoji property What does the emoji property mean, semantically? That the codepoint represents a pictograph or that vendors have ?permission? to give it a colored, stylized representation? If we go with the first, then hieroglyphs should certainly be emoji. Although it seems unlikely (to put it lightly) that hieroglyph emoji would be deployed due to the burden on vendors, it does seem logically appropriate that we treat all pictographs equally, and aside from usage I see no difference between U+1F989 OWL ?? and U+13153 EGYPTIAN HIEROGLYPH G017 ??. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokedhs at gmail.com Thu Apr 6 00:24:23 2017 From: lokedhs at gmail.com (=?UTF-8?Q?Elias_M=C3=A5rtenson?=) Date: Thu, 6 Apr 2017 13:24:23 +0800 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: On 6 April 2017 at 11:32, Rebecca Bettencourt wrote: We do have to provide Unicode with fonts, I believe. We can use an existing > C64 font, such as Pet Me. Or, we can create a new font with vectorized > versions of the characters. > Are there any existing C64 fonts with vectorised glyphs? > Then there is the issue of what to do with the text colour and style > selectors. PETSCII has characters that indicate a colour change as well as > reverse video. At least the reverse video one is important, as it's being > used to construct new characters. For example, PETSCII only has a single > character "half block" (top part filled). The way you represent a half > block with the bottom part filled is to use the reverse video together with > the former. > >> >> It would probably make more sense to represent the reversed symbols as >> separate code points? >> > > I would actually leave the color-change and reverse-video characters to a > higher-level protocol. > For colour change, I definitely agree. The reverse video case is a bit different since the resulting characters are very much separate symbols by themselves. I think I need to take a closer look at existing C64 textual content to see how it was actually being used in real life. I do recall that reverse video was heavily used in file names, so there is definitely an argument for introducing ?COMBINING PETSCII REVERSE VIDEO?. It would be unfortunate if higher-level markup is required to accurately represent the name of a file stored on a C64 floppy disc. Regards, Elias -------------- next part -------------- An HTML attachment was scrubbed... URL: From eik at iki.fi Thu Apr 6 00:29:49 2017 From: eik at iki.fi (Erkki I Kolehmainen) Date: Thu, 6 Apr 2017 08:29:49 +0300 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: <003401d2ae96$cf705a60$6e510f20$@fi> +1 Erkki I. Kolehmainen Mannerheimintie 75 B 37, 00270 Helsinki, Finland Mob: +358400825943, Fax: +35813318116 L?hett?j?: Unicode [mailto:unicode-bounces at unicode.org] Puolesta Mark Davis ?? L?hetetty: 4. huhtikuuta 2017 19:58 Vastaanottaja: verdy_p Kopio: Michael Everson; Garth Wallace; unicode Unicode Discussion Aihe: Re: Proposal to add standardized variation sequences for chess notation Amusing at this is, hard to believe that people are spending this much time on an April Fool's posting. I'm looking forward to similar postings on checkers and go pieces. As a matter of fact, one that proposes adding new characters for every possible configuration of a go board would be imaginative. And I'm looking also forward to the ?+ZWJ+?? (etc) proposal. Mark Mark On Tue, Apr 4, 2017 at 3:00 PM, Philippe Verdy wrote: 2017-04-04 1:30 GMT+02:00 Michael Everson : On 3 Apr 2017, at 23:07, Asmus Freytag (c) wrote: > > On 4/3/2017 2:15 PM, Michael Everson wrote: >> On 3 Apr 2017, at 17:16, Asmus Freytag wrote: >> >>>>> The same indirection is at play here. >>>>> >>>> This is pure rhetoric, Asmus. It addresses the problem in no way. >>>> >>> Actually it does. I'm amazed that you don't see the connection. >>> >> I?ve never understood you when you back up into that particular kind of abstract rhetoric. > > Sometimes thinking through something in abstract terms actually clarifies the situation. Of course I know that?s your view. It?s just never been an effective communication strategy between you and me generally. >>> The ?meaning? of a chess-problem matrix is the whole 8 ? 8 board, not the empty dark square at b4 or the white pawn on > > In other words, you assert that partial boards never need to be displayed. (Let's take that as read, then). No, I am sure that a variety of board shapes can be set in plain text with these conventions, though the principle concern is classical chess notation. >> The ?problem? the higher-level protocol is supposed to solve is the one where a chess piece of one colour sits in an em-squared zone whether light or dark. In lead type this was a glyph issue. Lead type had just exactly what my proposal has: A piece with in-line text metrics, spaced harmoniously with digits and letters, and square sorts with and without hatching. > > Leaving aside the abstract question whether modeling lead type is ipso facto the best solution in all cases? I think it was a good expedient solution in lead type and that this proposal offers a robust parseable digital version of that solution, and I assert people will make use of that data structure. >> OK, then you support the part of the proposal that applies VS1 and VS2 to the chess pieces. > > My statement just was that a proposal where piece + VS should be M-square, piece w/o VS should be generic, might make some sense (and same for a suitable "empty" cell). > > The next question would be whether the alternation in background is best expressed in variation sequences or by some other means. I think the value in the data structures I have described is best retained as text. Anything else just seems it would be simply needlessly complex, > If you never need to show just a single field, then I concede that the main drawback of variation selectors for the background style is absent; however, reading ahead in your message, the partial grid appears to be common, therefore the reason to choose an alternate solution to the background style is a strong one. Well, it?s text, Asmus, so you can delete all but one line of a board if you want: ?????????????????? There. So? what are you talking about? It?s a text matrix. It?s like a kind of poem. ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????? It even looks like one. That?s a meaningful pattern. A kind of writing system. For me it looks like ASCII art, a hack mixing various characters intended for different uses and ignoring all semantics, only working because it reuses similar-looking glyphs instead of being an actual encoding. That represetnation is absultely not semantically coherent. If we want to have true checkboard cells, we need characters specifically for them, and in them we'll place (or not) chess pieces or any other suitable symbol or letter. This means creating clusters (cell+ZWJ+piece). This will be coherent. If we want to have borders for boards, we need coherent characters for them (we do not expct them to be combined with pieces, just that they will properly glue with cells in the middle of the board, and that their metric match them in suitable fonts). The fact that legacy renderers or fonts won't display that correctly is definitely not an argument. Many scripts still have problems being represented with legacy renderers or fonts. But the encoding is made to be coherent semantically. Fonts and rederers will adapt their properties to render what is semantically wanted and that will be also pleasing to read, and they still will be able to use various variants (e.g. emoji styles for pieces, possibly with 3D effects and colors, possibly animated pieces, or alternate decorative patterns in board cells, possibly photographic-based, such as wood, marble, grass, sand, glass, iron...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Thu Apr 6 02:01:38 2017 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J._D=c3=bcrst?=) Date: Thu, 6 Apr 2017 16:01:38 +0900 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: Hello Michael, [I started to write this mail quite some time ago. I decided to try to let things cool down a bit by waiting a day or two, but it has become more than a week now.] On 2017/03/29 22:08, Michael Everson wrote: > Martin, > > It?s as though you?d not participated in this work for many years, really. Well, looking back, my time commitment to Unicode has definitely varied over the years. But that might be true for everybody. What's more important is that Unicode covers such a wide range of areas, and not everybody has the same experience or knowledge. If we did, we wouldn't need to work together; it would be okay to just have one of us. Indeed, what's really very valuable and interesting in this work is the many very varied backgrounds and experiences everybody has. In addition to variations in background, we also have a wide variety of ways of thinking, e.g. ranging from abstract to concrete, and so on. >> On 29 Mar 2017, at 11:12, Martin J. D?rst wrote: >> - That suggests that IF this script is in current use, > > You don?t even know? You?re kidding, right? Everything is relative. And without being part of the user community, it's difficult to make any guesses. >> - As far as we have heard (in the course of the discussion, after questioning claims made without such information), it seems that: > > Yeah, it doesn?t ?seem? anything but a whole lot of special pleading to bolster your rigid view that the glyphs in question can be interchangeable because of the sounds they may represent. I don't remember every claiming that the glyphs must be used interchangeably, only that we should carefully examine whether they are or not, and that because they represent the same sound (in a phonetic alphabet, as it is) and are shown in the same position in alphabet tables, we shouldn't a priori exclude such a possibility. >> - There may not be enough information to understand how the creators and early users of the script saw this issue, > > Um, yeah. As if there were for Phoenician, or Luwian hieroglyphs, right? Well, there's well over an order of magnitude difference in the time scales involved. The language that Deseret is used to write is still in active use, including in this very discussion. Quite different from Phoenician or Luwian hieroglyphs. In addition, we have meta-information such as alphabet tables, which we may not have for the scripts you mention, as well as the fact that printing technology may have forced a better identification of what's a character and what not than inscriptions and other older technologies. >> - Similarly, there seem to be not enough modern practitioners of the script using the ligatures that could shed any light on the question asked in the previous item in a historical context, > > Completely irrelevant. Nobody worried about the number of modern users of the Insular letters we encoded. Why put such a constraints on users of Deseret? ?? ?? ?? ? ?? ?? ??. Because it's modern users, and future users, not users some hundred years or so ago, that will use the encoding. In the case of Insular letters, my guess is that nobody wants to translate/transcribe xkcd, for example, whereas there is such a transcription for Deseret: http://www.deseretalphabet.info/XKCD/ >> first apparently because there are not that many modern practitioners at all, and second because modern practitioners seem to prefer spelling with individual letters rather than using the ligatures. > > This is equally ridiculous. John Jenkins chooses not write the digraphs in the works which he transcribed, because that?s what *he* chooses. He doesn?t speak for anyone else who may choose to write in Deseret, and your assumption that ?modern practitioners? do this is groundless. You wrote: >>>> Most readers and writers of Deseret today use the shapes that are in their fonts, which are those in the Unicode charts, and most texts published today don?t use the EW and OI ligatures at all, because that?s John Jenkins? editorial practice. >>>> So I was wrong to write "modern practitioners", and should have written "modern publishers" or "modern published texts". Or is the impression that I get from what you wrote above wrong that most texts published these days are edited by John, or by people following his practice? > It also ignores the fact that the script had a reform and that the value of separate encodings for the various characters is of value to those studying the provenance and orthographic practices of those who wrote Deseret when it was in active use. I don't remember denying the value of separate encodings for historic research. I only wanted to make sure that present-day use isn't inconvenienced to make historic research easier. If the claims are correct that present-day usage is mostly a reconstruction based on the Unicode encoding and the Unicode sample glyphs, then I'm fine with helping historic research. > This is exactly the same thing as the medievalist Latin abbreviation and other characters we encoded. There is neither sense nor logic nor utility in trying to argue for why editors of Deseret documents shouldn?t have the same kinds of tools that medievalists have. And as far as medievalist concerns go, many of the characters are used by relatively few researchers. Some of the characters we encoded are used all over Europe at many times. Some are used only by Nordicists, some by Celticists, and some by subsets within the Nordicist and Celticist communities. Maybe, maybe not. If e.g. somebody came and said that they wanted to disunify the ?s and ?z ligatures for (German) ? in order to better analyze some old manuscripts, and the modern users from hereon had to make sure they used the right one depending on the font they used, then I'm sure a lot of Germans would complain quite clearly, because it would make their current use more complicated. >> - IF the above is true, then it may be that these ligatures are mostly used for historic purposes only, in which case it wouldn't do any harm to present-day users if they were separated. > > Harm? What harm? Recently the UTC looked at a proposal for capital letters for ? and ?. Evidence for their existence was shown. One person on the call to the UTC said he didn?t think anyone needed them. Two of us do need them. I needed them last weekend and I had to use awkward workarounds. They weren?t accepted. There wasn?t any good rationale for the rejection. I mean, the letters exist. Case is a normal function of the script. But they weren?t accepted. For the guy who didn?t think he needed them, well, so what? If they?re encoded, he doesn?t have to use them. I have no idea what the reasons for this were, because I wasn't involved in the discussion. >> If the above is roughly correct, then it's important that we reached that conclusion after explicitly considering the potential of a split to create inconvenience and confusion for modern practitioners, > > People who use Deseret use it to for historical purposes and for cultural reasons. Everybody in Utah reads English in standard Latin orthography. I haven't been in Utah except for a one-time flight change in Salt Lake City more than 10 years ago. So please don't assume that everybody on this list know the state of usage for all the scripts that get discussed. >> not after just looking at the shapes only, coming up with separate historical derivations for each of them, and deciding to split because history is way more important than modern practice. > > I didn?t ?come up? with separate historical derivations for the four characters in question. I didn't mean "come up" in the sense of "make up out of thin air", but in the sense of "discover". If it wasn't you but somebody else who discovered these derivations, please let us know. >> On 2017/03/28 22:56, Michael Everson wrote: >>> On 28 Mar 2017, at 11:39, Martin J. D?rst wrote: >> >>> An ? ligature is a ligature of a and of e. It is not some sort of pretzel. >> >> Yes. But it's important that we know that because we have been faced with many cases where "?" and "ae" were used interchangeably. > > Irrelevant. This is just spelling. It?s no different than colour/color or maximize/maximise or aluminium/aluminum. Whether we use "?" or "ae" is indeed a matter of spelling. But I meant something else, namely that we know that what may look like a "pretzel" to the uninitiated is a ligature of 'a' and 'e' exactly because we use it as a spelling variant for "ae". >>> What Deseret has is this: >>> >>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE >>> * officially named ?ew? in the code chart >>> * used for ew in earlier texts >>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE >>> * officially named ?oi? in the code chart >>> * used for oi in earlier texts >>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE >>> * used for oi in later texts >>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE >>> * used for ew in later texts >> >> Currently, it has this: >> >> 10426 ?? DESERET CAPITAL LETTER OI >> >> 10427 ?? DESERET CAPITAL LETTER EW > > You are being deliberately obtuse. Note that I stated clearly ?officially named ?ew/oi? in the code chart?. Well, if you think I'm deliberately obtuse, then I'd have to say that I think you're (deliberately?) obscure. You repeat hypothetical, non-existing names such as "DESERET CAPITAL LETTER LONG OO WITH STROKE" over and over, using capitals to make then look like the actual names, and bury the actual names (such as "DESERET CAPITAL LETTER OI") by shortening and lowercasing them. > Don?t go trying to tell me that EW and SHORT OO WITH STROKE are glyph variants of the same character. > > Don?t go trying to tell me that LONG AH WITH STROKE and OI are glyph variants of the same character. > > They?re not. The origin of all those letterforms is obvious, You don't have to repeat that. I clearly said, maybe even more than once, that I can agree with your hypothesis on the origin of these letter forms. > and we do not encode sounds, we encode the elements of writing systems. Yes. And we know that individual elements of a writing system sometimes can have multiple origins. >> But we have seen cases where such a merge happens. ? is one of them. > > That?s even arguable because ?? only really occurs in the whole-font Fraktur style. It?s pretty rare to see it in Antiqua. Of course it must be attested there, but it?s by no means common. Do you mean that the merge didn't happen style-wise? That we therefore don't need separate code points because historians don't need to distinguish between the two; they can just rely on the font used? But even if that weren't the case, we would still want to treat it as one and the same character, with a single code point. It would still be hopelessly impractical for Germans to use two different characters, when they only can decide which character to type once they have seen the actual character in the font they type, and have to potentially change the character if they change the font. And while we currently have no evidence that Deseret had developed a typographic tradition where some type styles would use one set of ligatures, and other styles would use another set, it wouldn't be possible to reject this possibility without actually trying to find evidence one way or another. >> There are quite a few in Han (not surprising because there are tons of ideographs there to begin with). >> >> But that experience doesn't mean that we have to rush to a conclusion without examining as much of the evidence as we can get hold of. > > I haven?t rushed to a conclusion. I?ve made a thorough analysis. You made a thorough analysis of the graphic shapes. You may have made some analysis with respect to usage, but you didn't present it initially, and it took quite some time to get to it in this discussion. >>> You?re smarter than that. So are Asmus and Mark and Erkki and any of the other sceptics who have chimed in here. >> >> Skepticism is when presented with options without background facts is a virtue in my opinion. > > Your argument seemed to be based solely on the use of the letters for the sounds, ignoring the historical derivation and the facts of the spelling reform in Deseret. The spelling reform is fine. What is important is what happened after the spelling reform. Were the 1855 variants replaced by the 1859 variants? Was it two different traditions, separated in some way or other? Or was it in effect more like a mixture of both? (or maybe we don't know, or it's a little of everything?) Examining these questions and bringing the available data to light and clarifying the limits of our data and our understanding is very important. Only in this way can we make decisions that will hopefully be valid for the rest of the existence of Unicode (which might be quite a few decades at least), or decisions that at a minimum might be evaluated as "well, they didn't know better then", rather than as "they definitely should have known better, even then". >>> On 28 Mar 2017, at 11:59, Mark Davis ?? wrote: >>> >>>> ?I agree with Martin. >> >>>> Simply because someone used a particular shape at some time to mean a letter doesn't mean that Unicode should encode a letter for that shape. >>> >>> Coming to a forum like this out of a concern for the corpus of Deseret literature is not some sort of attempt to encode things for encoding?s sake. >> >> And coming to a discussion like this out of a concern for modern practitioners of the script (even if it seems, after a lot of discussion, that there aren't that many of these, and the issue at hand may indeed not concern them that much) is not some sort of attempt to unify things for unification's sake. > > I think you made a lot of assumptions about ?modern practitioners? which you didn?t disclose. Maybe. But so likewise, you made a lot of assumptions about (the absence) of modern practitioners which you didn't disclose. > A proposal will be forthcoming. I want to thank several people who have written to me privately supporting my position with regard to this topic on this list. I can only say that supporting me in public is more useful than supporting me in private. I'm looking forward to your proposal. I hope it clearly indicates why (you think) there's no danger of inconveniencing modern practitioners. I'd also like to thank the people who supported me, all of them on the list. Regards, Martin. From beckiergb at gmail.com Thu Apr 6 02:19:50 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Thu, 6 Apr 2017 00:19:50 -0700 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: I've completed my unified chart: https://docs.google.com/document/d/10RJKTNFZFEww0yRVPzPdeNnyC_PUkAMhn7OVB7YdTFc/edit?usp=sharing The result is either 20 or 24 characters to be encoded, depending on whether or not 4 of them should be unified with existing characters. 14 have fairly obvious names following a pattern established by existing characters: Left and Lower One Eighth Block Left and Upper One Eighth Block Right and Upper One Eighth Block Right and Lower One Eighth Block Left Half Medium Shade Lower Half Medium Shade Right One Quarter Block Right Three Eighths Block Upper One Quarter Block Upper Three Eighths Block Four-by-Four Checker Board Reverse Four-by-Four Checker Board Upper Left to Lower Right Fill Upper Right to Lower Left Fill 10 need some more thinking (they are all horizontal and vertical lines at various positions within the character cell; naming depends on if we want to unify some of them with the HORIZONTAL SCAN LINEs in the Miscellaneous Technical block). -- Rebecca Bettencourt On Wed, Apr 5, 2017 at 10:24 PM, Elias M?rtenson wrote: > On 6 April 2017 at 11:32, Rebecca Bettencourt wrote: > > We do have to provide Unicode with fonts, I believe. We can use an >> existing C64 font, such as Pet Me. Or, we can create a new font with >> vectorized versions of the characters. >> > > Are there any existing C64 fonts with vectorised glyphs? > > >> Then there is the issue of what to do with the text colour and style >> selectors. PETSCII has characters that indicate a colour change as well as >> reverse video. At least the reverse video one is important, as it's being >> used to construct new characters. For example, PETSCII only has a single >> character "half block" (top part filled). The way you represent a half >> block with the bottom part filled is to use the reverse video together with >> the former. >> >>> >>> It would probably make more sense to represent the reversed symbols as >>> separate code points? >>> >> >> I would actually leave the color-change and reverse-video characters to a >> higher-level protocol. >> > > For colour change, I definitely agree. The reverse video case is a bit > different since the resulting characters are very much separate symbols by > themselves. > > I think I need to take a closer look at existing C64 textual content to > see how it was actually being used in real life. I do recall that reverse > video was heavily used in file names, so there is definitely an argument > for introducing ?COMBINING PETSCII REVERSE VIDEO?. It would be unfortunate > if higher-level markup is required to accurately represent the name of a > file stored on a C64 floppy disc. > > Regards, > Elias > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokedhs at gmail.com Thu Apr 6 02:25:42 2017 From: lokedhs at gmail.com (=?UTF-8?Q?Elias_M=C3=A5rtenson?=) Date: Thu, 6 Apr 2017 15:25:42 +0800 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: Wouldn't it make sense to get in touch with active Commodore 64 communities to find out how people deal with this today? I'm sure there are use cases that none of us have thought about. Regards, Elias On 6 April 2017 at 15:19, Rebecca Bettencourt wrote: > I've completed my unified chart: > > https://docs.google.com/document/d/10RJKTNFZFEww0yRVPzPdeNnyC_ > PUkAMhn7OVB7YdTFc/edit?usp=sharing > > The result is either 20 or 24 characters to be encoded, depending on > whether or not 4 of them should be unified with existing characters. > > 14 have fairly obvious names following a pattern established by existing > characters: > > Left and Lower One Eighth Block > Left and Upper One Eighth Block > Right and Upper One Eighth Block > Right and Lower One Eighth Block > Left Half Medium Shade > Lower Half Medium Shade > Right One Quarter Block > Right Three Eighths Block > Upper One Quarter Block > Upper Three Eighths Block > Four-by-Four Checker Board > Reverse Four-by-Four Checker Board > Upper Left to Lower Right Fill > Upper Right to Lower Left Fill > > 10 need some more thinking (they are all horizontal and vertical lines at > various positions within the character cell; naming depends on if we want > to unify some of them with the HORIZONTAL SCAN LINEs in the Miscellaneous > Technical block). > > > > > > > > -- Rebecca Bettencourt > > On Wed, Apr 5, 2017 at 10:24 PM, Elias M?rtenson > wrote: > >> On 6 April 2017 at 11:32, Rebecca Bettencourt >> wrote: >> >> We do have to provide Unicode with fonts, I believe. We can use an >>> existing C64 font, such as Pet Me. Or, we can create a new font with >>> vectorized versions of the characters. >>> >> >> Are there any existing C64 fonts with vectorised glyphs? >> >> >>> Then there is the issue of what to do with the text colour and style >>> selectors. PETSCII has characters that indicate a colour change as well as >>> reverse video. At least the reverse video one is important, as it's being >>> used to construct new characters. For example, PETSCII only has a single >>> character "half block" (top part filled). The way you represent a half >>> block with the bottom part filled is to use the reverse video together with >>> the former. >>> >>>> >>>> It would probably make more sense to represent the reversed symbols as >>>> separate code points? >>>> >>> >>> I would actually leave the color-change and reverse-video characters to >>> a higher-level protocol. >>> >> >> For colour change, I definitely agree. The reverse video case is a bit >> different since the resulting characters are very much separate symbols by >> themselves. >> >> I think I need to take a closer look at existing C64 textual content to >> see how it was actually being used in real life. I do recall that reverse >> video was heavily used in file names, so there is definitely an argument >> for introducing ?COMBINING PETSCII REVERSE VIDEO?. It would be unfortunate >> if higher-level markup is required to accurately represent the name of a >> file stored on a C64 floppy disc. >> >> Regards, >> Elias >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prosfilaes at gmail.com Thu Apr 6 02:27:23 2017 From: prosfilaes at gmail.com (David Starner) Date: Thu, 06 Apr 2017 07:27:23 +0000 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: On Thu, Apr 6, 2017 at 12:07 AM Martin J. D?rst wrote: > And while we currently have no evidence that Deseret had developed a > typographic tradition where some type styles would use one set of > ligatures, and other styles would use another set, it wouldn't be > possible to reject this possibility without actually trying to find > evidence one way or another. > Deseret didn't really develop a typographic tradition at all. To quote Wikipedia: At least four books were published in the new alphabet, all transcribed by Orson Pratt and all using the Russell's House font: The First Deseret Alphabet Reader (1868), The Second Deseret Alphabet Reader (1868), The Book of Mormon (1869), and a Book of Mormon excerpt called First Nephi?Omni (1869). There's also a couple years where the Deseret News printed a short piece in the Deseret alphabet in every issue, but in any case, these new glyphs never had a metal type made for them and never saw print until modern times. -------------- next part -------------- An HTML attachment was scrubbed... URL: From c933103 at gmail.com Thu Apr 6 02:28:55 2017 From: c933103 at gmail.com (gfb hjjhjh) Date: Thu, 6 Apr 2017 15:28:55 +0800 Subject: Implementation of ideographic description characters In-Reply-To: References: Message-ID: Seems like Source Han Serif have just implemented such functionality? Or is this just partial. https://twitter.com/tualatrix/status/849178587680735232 -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Thu Apr 6 04:08:43 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Apr 2017 11:08:43 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <20170403203355.6cbfc184@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> Message-ID: <2098249861.45062.1491469723263.JavaMail.open-xchange@app06.ox.hosteurope.de> Richard Wordingham : > > The basic text elements in the scheme other than boundary markers will be: > > empty white square > empty black square > white square with specific piece on it > black square with specific piece on it. > > If the variation selectors are ignored, these simplify to: > > white square > hatched square > specific piece > > This preserves all the information; the pattern of squares is known in advance > and therefore redundant. I argued before that empty square specific piece would already be enough to carry all the required semantics, at least for drawing complete boards, because the coloring pattern is simple, well-known and redundant. From alastair at alastairs-place.net Thu Apr 6 04:09:47 2017 From: alastair at alastairs-place.net (Alastair Houghton) Date: Thu, 6 Apr 2017 10:09:47 +0100 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: On 6 Apr 2017, at 08:25, Elias M?rtenson wrote: > > Wouldn't it make sense to get in touch with active Commodore 64 communities to find out how people deal with this today? I'm sure there are use cases that none of us have thought about. Since most of the issue is graphics characters, and since that same problem affects PETSCII, ATASCII, the ZX80 set, and Teletext/Videotex/Viewdata (aka BBC Micro mode 7), would it be better to come up with a complete set of extra graphic characters that need to be encoded, and make it a proposal to ?complete the set? of box drawing and graphics characters? IMO the Teletext set is *much* more important than PETSCII or ATASCII; while there will very likely be text encoded in the latter two, there are significant volumes encoded in the Teletext set. Quite a bit of data has already been lost (there are mirrors of old Prestel/Viewdata BBS systems, some of which have sadly lost all the graphics because of the lack of equivalent Unicode characters), and a lot of the rest is either encoded in a non-standard encoding or held as screen shots. Also, it would be worth looking to see if there are any discussions from past attempts to get any of these things into the Unicode standard; I can?t imagine this is the first time anyone?s asked for more graphics characters. Kind regards, Alastair. -- http://alastairs-place.net From khaledhosny at eglug.org Thu Apr 6 04:54:17 2017 From: khaledhosny at eglug.org (Khaled Hosny) Date: Thu, 6 Apr 2017 11:54:17 +0200 Subject: Coloured Punctuation and Annotation In-Reply-To: <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> References: <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> Message-ID: <20170406095417.GA21974@macbook> On Wed, Apr 05, 2017 at 05:29:57PM -0700, Asmus Freytag wrote: > On 4/5/2017 5:14 PM, Michael Everson wrote: > > > On 5 Apr 2017, at 23:16, Asmus Freytag wrote: > > > > > > Do you have any examples of plain text that is rendered with parts of characters having white (opaque) background? > > > > > > I'm not aware of any > > There are certainly MSS (in many languages) where some punctuation made of dots have some of the dots red and some black. > > Agreed, those would be a challenge to reproduce with standard font > technology and in plain text. Not any more, thanks to Emoji! This page should show colored Hamza, diacritical dots and vowel marks on web browsers that support MS color font format (currently Firefox, Edge, and Internet Expoler on latest Windows 10): http://www.amirifont.org/fatiha-colored.html No special markup have been used, the color information is embedded in a regular OpenType font. Regards, Khaled From christoph.paeper at crissov.de Thu Apr 6 05:00:36 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Apr 2017 12:00:36 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> Message-ID: <330428926.45676.1491472836694.JavaMail.open-xchange@app06.ox.hosteurope.de> Michael Everson : > > Standardized variation sequences are the best way to achieve this simply and > without needless duplication. :-) I still agree with this assertion. > > The distinction between white/black background might be of a different > > nature. If you have arranged everything in a grid with the correct matrix, > > then the color of the background is perhaps redundant, given that there is a > > uniform convention for it. > > Yes but you still want it to be reasonably legible when the OpenType ligatures > fail. This is were I don't follow. > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????? > is far better than this: > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ??????????????????<< Is it the pawn or the queen that?s on the black square? > ?????????????????? > ?????????? It *looks* far better in a multi-line plain text environment, but that's a glyphic/typographic/stylistic argument. The semantics conveyed are redundantly encoded this way, so I wouldn't say it was far better. This alternating pattern is far more redundant than, say, pairs of opening and closing characters (brackets, quotation marks). Aside, good fallback isn't something the UTC seems to be concerned with lately, see emoji subregion flags that are all represented by Waving Black Flag in legacy implementations (possibly followed by TOFU). > See? To parse this one you have to remember which of the white squares are the > alternating black ones. No, you only have to remember that A1, i.e. the lower left square initially occupied by a white rook, is black. For legal moves, the color pattern hardly matters, unless - regarding pawns - it was common practice to render the board turned, i.e. with the white player not at the bottom, but at the top (or left or right) side, and without alphabetic column and numeric row labels. > The colour of the matrix is NOT redundant for a human reader. That's what this proposal is all about. It's a good and sound proposal, except for the empty square. From wl at gnu.org Thu Apr 6 05:50:02 2017 From: wl at gnu.org (Werner LEMBERG) Date: Thu, 06 Apr 2017 12:50:02 +0200 (CEST) Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406095417.GA21974@macbook> References: <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> <20170406095417.GA21974@macbook> Message-ID: <20170406.125002.1470028058588459599.wl@gnu.org> > This page should show colored Hamza, diacritical dots and vowel > marks on web browsers that support MS color font format (currently > Firefox, Edge, and Internet Expoler on latest Windows 10): > http://www.amirifont.org/fatiha-colored.html > > No special markup have been used, the color information is embedded > in a regular OpenType font. Very nice! It als works with Firefox on my GNU/Linux box. Werner From khaledhosny at eglug.org Thu Apr 6 06:08:22 2017 From: khaledhosny at eglug.org (Khaled Hosny) Date: Thu, 6 Apr 2017 13:08:22 +0200 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406.125002.1470028058588459599.wl@gnu.org> References: <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> <20170406095417.GA21974@macbook> <20170406.125002.1470028058588459599.wl@gnu.org> Message-ID: <20170406110822.GB21974@macbook> On Thu, Apr 06, 2017 at 12:50:02PM +0200, Werner LEMBERG wrote: > > > This page should show colored Hamza, diacritical dots and vowel > > marks on web browsers that support MS color font format (currently > > Firefox, Edge, and Internet Expoler on latest Windows 10): > > http://www.amirifont.org/fatiha-colored.html > > > > No special markup have been used, the color information is embedded > > in a regular OpenType font. > > Very nice! It als works with Firefox on my GNU/Linux box. I think I worded this vaguely, it works with Firefox on all platforms (even on Android), the Windows 10 restriction is for Internet Expoler only. Regards, Khaled From mpsuzuki at hiroshima-u.ac.jp Thu Apr 6 06:38:40 2017 From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya) Date: Thu, 06 Apr 2017 20:38:40 +0900 Subject: [Unicode] Re: Implementation of ideographic description characters In-Reply-To: References: Message-ID: <58E628C0.70500@hiroshima-u.ac.jp> Maybe, some precomposed glyphs (without standardized code points) are included in the font, and the registered IDS strings are internally converted to the glyph index to them, by ligature feature of OpenType. I guess, the "composition"-like behaviour is just visible for the set of IDS registered in the font, and the unregistered IDS string would not be displayed as single composed glyph. gfb hjjhjh wrote: > Seems like Source Han Serif have just implemented such functionality? Or is this just partial. https://twitter.com/tualatrix/status/849178587680735232 > From verdy_p at wanadoo.fr Thu Apr 6 06:50:22 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Apr 2017 13:50:22 +0200 Subject: Implementation of ideographic description characters In-Reply-To: References: Message-ID: What is demonstrated here is how to build a CID-keyed font supporting the the "unencoded glyphs" using IDS pseudo-encoding + OpenType "ccmp" (or alternatively "liga") feature. It speaks about an Adobe registry ("ROS") for some supported lexical dictionnaries, where encoded codepoints or unencoded glyphs (CID-key) can be mapped to subsets implementable in conforming fonts. http://blogs.adobe.com/CCJKType/2012/05/sp-ai0-ros.html It does not demonstrate how you can convert multiple glyphs and alter their metrics and placement to create composite glyphs. The actual composite glyphs are still manually tuned to build fonts. There are some attempts to generate composite glyphs automatically, but this has still always failed with traditional (serif-style) fonts. There's some limited success with simplified glyphs (not using strokes with variable weight), but the generated glyphs are ugly because of their uneven stroke width (the smallest parts are difficult to read, the larger ones are too bold and should have their metrics reduced). The assumption that width and height metrics are equal for all parts of a single IDS composite gives wrong results. You need first to determine how many subcolumns and ahow many subrows will tile the general composition square then assign seaprate "weights" to subcolumns and subrows by counting the number of stokes that interact on that dimension in the same subcolumn/subrow. With these weights you can then properly distribute the effective width of subcolumns, and effective height of subrows. With the total "weights" computed separate for each dimension, you then need to take its maximum value and make sure that stroke widths will not exceed this value. Then you can place the glyphs in subcolumns/subrows, but you also need to be able to determine parts of strokes that are allowed to exceed their subcolumn/subrow limit : generally this is the thinest ending nodes of the stroke, which may "touch" (intersect) some other strokes provided the colliding strokes are not parallel or nearly paralell so that their area of intersection will remain in a radius significantly smaller than the average stroke width). Such algorithm is not implementable directly in fonts, but it should be possible to instruct some complex metrics in base glyphs to allow some nodes to move slightly outside their definition box in a prefered direction. When glyphs are heavily narrowed horizontally or flattened vertically in their final rendering box (their size ratio is no longer a square) you need more specific hinting. The situation becomes more complex with some base glyphs for enclosure IDS (not just stacked side-by-side or on top of each other), but things may become simpler if these base strokes are themselves decomposed in IDS strings (using only side-by-side or top-to-bottom + more basic strokes: the defined IDS dictionnary ignore these subdecompositions because the standard IDS only use the encoded base strokes and the subdecomposition of encoded base strokes would need special codes for unencoded simple strokes. But there's still no standard hinting in OpenType fonts to instruct CJK glyphs so that their geometries may be properly adjusted while presxerving the visual font weight and overall readability. So this requires specific glyph renderers, and these glyph renderers are still not used by generic text renderers. These algorithms are then used only as tools for generating collections of glyphs in fonts in construction. Then complex glyphs are manually tuned and various metrics are adjusted. Hinting instructions are no logner present in the final (OpenType or SVG) fonts. 2017-04-06 9:28 GMT+02:00 gfb hjjhjh : > Seems like Source Han Serif have just implemented such functionality? Or > is this just partial. https://twitter.com/tualatrix/ > status/849178587680735232 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Thu Apr 6 07:02:50 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Apr 2017 14:02:50 +0200 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406110822.GB21974@macbook> References: <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> <20170406095417.GA21974@macbook> <20170406.125002.1470028058588459599.wl@gnu.org> <20170406110822.GB21974@macbook> Message-ID: Nice effectively, even if there are some geometric glitches in the first complex (wide) ligature for their black horizontal strokes at the bottom (I don't understand why they are partly broken, possibly caused by even/odd filling rules or some incorrect hinting reducing the widths to zero. 2017-04-06 13:08 GMT+02:00 Khaled Hosny : > On Thu, Apr 06, 2017 at 12:50:02PM +0200, Werner LEMBERG wrote: > > > > > This page should show colored Hamza, diacritical dots and vowel > > > marks on web browsers that support MS color font format (currently > > > Firefox, Edge, and Internet Expoler on latest Windows 10): > > > http://www.amirifont.org/fatiha-colored.html > > > > > > No special markup have been used, the color information is embedded > > > in a regular OpenType font. > > > > Very nice! It als works with Firefox on my GNU/Linux box. > > I think I worded this vaguely, it works with Firefox on all platforms > (even on Android), the Windows 10 restriction is for Internet Expoler > only. > > Regards, > Khaled > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 637275 at gmail.com Thu Apr 6 07:07:10 2017 From: 637275 at gmail.com (Rebecca T) Date: Thu, 6 Apr 2017 08:07:10 -0400 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: Here?s a copy of the Teletext character set; it includes box-drawing characters for all combinations of a 2?3 grid of cells. 2? = 64 characters, so we might need a new block. [1]: http://www.galax.xyz/TELETEXT/CHARSET.HTM -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Thu Apr 6 07:13:46 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 13:13:46 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <7d222120-a2a8-56f0-bc6e-348887b95d9f@it.aoyama.ac.jp> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <7d222120-a2a8-56f0-bc6e-348887b95d9f@it.aoyama.ac.jp> Message-ID: <4717F5A1-9693-4656-8A6F-AC5E25B3DF18@evertype.com> On 6 Apr 2017, at 04:24, Martin J. D?rst wrote: >> http://evertype.com/standards/unicode-list/looking-glass-yellow-blue.png > > [OT] > It looks neat. But I noticed three very small gaps in each of the top and bottom borders. I have not done anything to optimize display in these fonts. They were proof-of-concept fonts for the sequences. It?s easy to fix those? just drag the glyph and make it a bit longer. One does the same thing in Arabic fonts. > Also, it's probably not the best choice of colors, because my eyes tend to associate the yellow figures with white, and the blue ones with black, but thinking it through makes it clear that it's the other way round. I just picked the process colours cyan and yellow, but it was Richard who had specified the colours: ?Now, what happens to the two scheme if rendered with yellow text ('foreground') on a blue background?" Michael Everson From christoph.paeper at crissov.de Thu Apr 6 07:19:24 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Apr 2017 14:19:24 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> Message-ID: <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> Mark Davis ?? : > > I'm looking forward to similar postings on checkers and go pieces. (...) > And I'm looking also forward to the ?+ZWJ+?? (etc) proposal. Well, actually ... Garth Wallace made an important observation in : >> Currently, chess fonts can be (roughly) divided into "diagram fonts" and >> "notation fonts". The major goal of Michael Everson's proposal to introduce standardized sequences with variant selectors 1 and 2 (U+FE00/1) for chess piece characters (primarily U+2654-F), as far as I understand it, is to assure "diagram" glyph design. This means fixed-width figurines centered in the character cell, with means for square color and board border elements (incl. labels), whereas "notation" style usually has proportional figurines sitting on the baseline. The square color is ignored in standard chess notation, where fields are conventionally known by their alphabetic column index ("file", A-H for a standard checkerboard) and numeric row index ("rank", 1-8), i.e. A1 through H8, which are virtually never styled as "?A1", "?B1", "B2" whereas figurine charactres may either augment or substitute conventional letter symbols. - Black/dark squares are those whose file and rank are either both odd or both even. - White/light squares are those whose file is odd and rank is even, or vice versa. Corollary: The glyph background is almost only important within diagram notation. Diagrams may only show select squares, so the color of the first or last one and hence intermediate ones cannot necessarily be deduced from the immediate context. (They may be implied by row and column labels, which is simple for a sighted human reader, but complex for computers and blind readers.) Although Michael Everson readily dismisses any connection to emojis, e.g. L2/16-021 or L2/16-087+088, and hence the Emoji and Emoji_Presentation character properties as well as sequences with variation selectors 15 and 16 (U+FE0E/F), normal emoji design actually matches "diagram" notation quite nicely in that all emoji glyphs are rendered within an (ideographic / em) square. Black and white squares are also already available as emojis in small U+25AA/B, medium-small U+25FE/D, medium U+25FC/B and large U+2B1B/C. The last ones would probably be preferred. Only the first ones are default text style characters. The characters for empty squares from the proposal, U+25A8/1, have no emoji representation yet. I've suggested to use the hatching characters U+25Ax for their colors as in heraldic tinctures, which relate U+25A8 to Purple ("purpure"). Without the need for ZWJ sequences, Opentype fonts can employ their Contextual Alternates `calt` feature to select the correct background color in diagram notation: In a sequence of up to eight chess pieces without an empty square with explicit color, an initial U+2656-FE0F White Rook, U+2654-FE0F White King, U+265B-FE0F Black Queen or U+265F-FE0F Black Pawn would default to a black background, U+2659-FE0F White Pawn, U+2655 White Queen, U+265A-FE0F Black King or U+265C-FE0F Black Rook to a white background. Other than that, each character uses the alternate glyph with opposing background color from its preceding (left-side) glyph. The empty squares work as explicit anchors. A font intended for print wouldn't have to use any fancy colors or effects for emoji glyphs, but only infer the centered squared presentation from a variation selector, so it could still use proportional glyph in running text. In conclusion, although I support the proposal in principle, I strongly suggest to consider to use established VS-16 and implicit contextual backgrounds instead of arbitrary VS-1 and VS-2 with explicit backgrounds. (With only VS-16, my previous remarks about the representation of empty squares would be somewhat moot. Technically, it's still redundant, but at least it would be consistent.) References -------------- - L2/16-021: http://unicode.org/L2/L2016/16021-game-pieces-emoji.pdf - L2/16-087: http://unicode.org/L2/L2016/16087-provisional-value-for-emoji.pdf - L2/16-088: http://unicode.org/L2/L2016/16088-chars-for-emoji-provisional.pdf * https://docs.google.com/spreadsheets/d/1-XLoueD__NZtOPNz4bWl_HwOmwGIt_6aEaWV-l_I0UQ/ (broken) * https://docs.google.com/spreadsheets/d/1txhi8XYKFMkCaOOFMI2z1tQkNwhzkUN-htofj-oJ1GM/ (original) * https://docs.google.com/spreadsheets/d/1KQDH9uArJr-8m4UvAEd02ixaX_-wauSwjy9qYCwIOvE/ (extended) - Hatching colors: https://github.com/Crissov/unicode-proposals/issues/222 etc. From christoph.paeper at crissov.de Thu Apr 6 07:37:05 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Apr 2017 14:37:05 +0200 (CEST) Subject: Eszett variation sequence Message-ID: <1564892666.46890.1491482226024.JavaMail.open-xchange@app06.ox.hosteurope.de> U+00DF Latin Letter Sharp S ??? has at least two rather different visual styles resulting from a ligature of either long and round lowercase S, ??s?, or of long S and normal or tailed lowercase Z, ??z? or ????. Most modern typeface designs follow the first style and sometimes the right-hand side is quite distinct from the shape of the round S in the same font. In some cases it makes sense to distinguish the glyphic origins, because, by orthographic or graphotactic means, for instance, an _sz_ digraph is appropriate in different places than an _ss_ repeated letter. Would it make sense to propose standardized variation sequences for these styles or should this be left to font features like `cv##` or `calt` in Opentype? From everson at evertype.com Thu Apr 6 07:41:01 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 13:41:01 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406054107.20e40bd5@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> Message-ID: <068DC35F-4A31-44FA-9EA7-ADDDDE06D450@evertype.com> > On 6 Apr 2017, at 05:41, Richard Wordingham wrote: > > On Thu, 6 Apr 2017 01:11:09 +0100 > Michael Everson wrote: > >> On 5 Apr 2017, at 22:48, Richard Wordingham >> wrote: >> >>> I tried to read it from UTS#51 ?Unicode Emoji', which is not part of TUS, but I couldn't deduce that a font that enables U+10B99 PSALTER PAHLAVI SECTION MARK to have exactly two (as opposed to none or four) red dots is in breach of the guidelines therein. >> >> Kindly explain how ANY font could do this. > > Is this a trick question? No. Here is an example of a font available in two variants. In one variant, all those grey swirls are fused to the letters, and it can all be printed in black or one colour ink. http://cdn.myfonts.net/s/aw/original/255/0/131020.png There is also a second set of fonts included which separates the swirls from the letters, and those can be used in typesetting to get the two-colour effect you see here. That can?t really be done using standard encoding. You?d probably see IIVVOORRYY in the backing store for that word, with every other letter being set in the letter font and the swirl font. Emoji-style colour fonts use other mechanisms for colour. Michael Everson From everson at evertype.com Thu Apr 6 07:25:46 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 13:25:46 +0100 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: <30B5FF67-D4A4-499F-9447-E56FBE7C6117@evertype.com> On 6 Apr 2017, at 04:32, Rebecca Bettencourt wrote: > We do have to provide Unicode with fonts, I believe. We can use an existing C64 font, such as Pet Me. Or, we can create a new font with vectorized versions of the characters. I?ll help with that; we should harmonize with other characters in the standard. At some point this should be taken off the main list since discussion will get very detailed very quickly. Michael Everson From everson at evertype.com Thu Apr 6 07:57:50 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 13:57:50 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <330428926.45676.1491472836694.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <330428926.45676.1491472836694.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <85BB6AAE-8319-446C-9C42-F6535CF043D4@evertype.com> On 6 Apr 2017, at 11:00, Christoph P?per wrote: > > Michael Everson : >> >> Standardized variation sequences are the best way to achieve this simply and without needless duplication. :-) > > I still agree with this assertion. So do I.. ;-) >> Yes but you still want it to be reasonably legible when the OpenType ligatures fail. > > This is were I don't follow. Why wouldn?t you want it to be reasonably legible when the OpenType ligatures can?t be displayed? ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????? is far better than this: ?????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ?????????????????? ??????????????????<< Is it the pawn or the queen that?s on the black square? ?????????????????? ?????????? > It *looks* far better in a multi-line plain text environment, but that's a glyphic/typographic/stylistic argument. It?s an argument for legibility. > The semantics conveyed are redundantly encoded this way, so I wouldn't say it was far better. This alternating pattern is far more redundant than, say, pairs of opening and closing characters (brackets, quotation marks). It?s not redundant to the reader. The reader of the second one has to remember that the dark square is the lower left, and then count in order to know the colour of any given square. The reader of the first one doesn?t have to do this, because we have both ?? and ??, two encoded characters, and we use them for convenience. > Aside, good fallback isn't something the UTC seems to be concerned with lately, Inconsistency on the part of the UTC is not my concern. I have to > see emoji subregion flags that are all represented by Waving Black Flag in legacy implementations (possibly followed by TOFU). Yes, well, that?s an example of a decision that didn?t have good oversight or feedback, perhaps. I do know that falling back to a black flag rather than to the Union flag for Wales, England, and Scotland doesn?t seem very sensible. Leaving out the de-facto flag of Northern Ireland wasn?t very wise either, though nobody asked the UK or Irish representatives of SC2 their opinion about it. >> See? To parse this one you have to remember which of the white squares are the alternating black ones. > > No, you only have to remember that A1, i.e. the lower left square initially occupied by a white rook, is black. You have to remember that, and then you have to count every other square in whatever direction to know what colour a given square is. That?s not very user-friendly. And it?s easy to be user friendly. Just use both ?? and ??. > For legal moves, the color pattern hardly matters, unless - regarding pawns - it was common practice to render the board turned, i.e. with the white player not at the bottom, but at the top (or left or right) side, and without alphabetic column and numeric row labels. For legal moves, no. But this is text. The table is meant to be read. Since it is, good fallback is better than bad fallback. >> The colour of the matrix is NOT redundant for a human reader. > > That's what this proposal is all about. It's a good and sound proposal, except for the empty square. Do you mean ?except for the light and dark squares without a piece on them? or ?except for the light square without a piece on it?? The convention is to have two alternating shades on the squares and there?s no advantage to the human reader to quash this distinction. What is your specific counter-proposal? Michael Everson From everson at evertype.com Thu Apr 6 08:26:56 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 14:26:56 +0100 Subject: Eszett variation sequence In-Reply-To: <1564892666.46890.1491482226024.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <1564892666.46890.1491482226024.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: Can you give an example of any font which has two glyphs in it for ?? I mean, I was in Berlin and I took this picture: http://evertype.com/standards/unicode-list/seydlitzstr.jpg Do you think we should encode a Latin straight y (like the Cyrillic one) so we can write Se?dlitzstra??e? > On 6 Apr 2017, at 13:37, Christoph P?per wrote: > > U+00DF Latin Letter Sharp S ??? has at least two rather different visual styles resulting from a ligature of either long and round lowercase S, ??s?, or of long S and normal or tailed lowercase Z, ??z? or ????. Most modern typeface designs follow the first style and sometimes the right-hand side is quite distinct from the shape of the round S in the same font. In some cases it makes sense to distinguish the glyphic origins, because, by orthographic or graphotactic means, for instance, an _sz_ digraph is appropriate in different places than an _ss_ repeated letter. > > Would it make sense to propose standardized variation sequences for these styles or should this be left to font features like `cv##` or `calt` in Opentype? From christoph.paeper at crissov.de Thu Apr 6 08:36:21 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 6 Apr 2017 15:36:21 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <85BB6AAE-8319-446C-9C42-F6535CF043D4@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <330428926.45676.1491472836694.JavaMail.open-xchange@app06.ox.hosteurope.de> <85BB6AAE-8319-446C-9C42-F6535CF043D4@evertype.com> Message-ID: <1164365392.47233.1491485781803.JavaMail.open-xchange@app06.ox.hosteurope.de> > Michael Everson hat am 6. April 2017 um 14:57 > geschrieben: > >> That's what this proposal is all about. It's a good and sound proposal, >> except for the empty square. > > Do you mean ?except for the light and dark squares without a piece on them? or > ?except for the light square without a piece on it?? I meant "except for the squares without a piece on them". > What is your specific counter-proposal? Either VS-16 and Emoji property for pieces, being used with U+2B1B/C as empty squares, as explained in a subsequent message, or VS-1 and VS-2 sequences with a single codepoint to represent an empty square (consistent with pieces). From everson at evertype.com Thu Apr 6 08:50:41 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 14:50:41 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <8C8ED76B-C438-498C-892D-58E20DD484D6@evertype.com> On 6 Apr 2017, at 13:19, Christoph P?per wrote: > > Although Michael Everson readily dismisses any connection to emojis, e.g. L2/16-021 or L2/16-087+088, and hence the Emoji and Emoji_Presentation character properties as well as sequences with variation selectors 15 and 16 (U+FE0E/F), normal emoji design actually matches "diagram" notation quite nicely in that all emoji glyphs are rendered within an (ideographic / em) square. No, no. Emojis are something else very specific and very expensive with implications for vendors and having to do with colour. Look at zero: U+0030 - 0 - DIGIT ZERO U+0030 FE00 - 0? - short diagonal stroke form U+0030 FE0E - 0? - text style U+0030 FE0F - 0? - emoji style Emoji is something else. Emoji is a fine thing, but it?s not chessboard typesetting. Michael Everson From everson at evertype.com Thu Apr 6 09:07:20 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 15:07:20 +0100 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: On 6 Apr 2017, at 08:01, Martin J. D?rst wrote: > Hello Michael, Hi Martin. >> It?s as though you?d not participated in this work for many years, really. > > Well, looking back, my time commitment to Unicode has definitely varied over the years. But that might be true for everybody. I just get frustrated when everyone including the veterans seems to forget every bit of precedent that we have for the useful encoding of characters. > What's more important is that Unicode covers such a wide range of areas, and not everybody has the same experience or knowledge. If we did, we wouldn't need to work together; it would be okay to just have one of us. Indeed, what's really very valuable and interesting in this work is the many very varied backgrounds and experiences everybody has. I do not disagree, particularly. >>> - That suggests that IF this script is in current use, >> >> You don?t even know? You?re kidding, right? > > Everything is relative. And without being part of the user community, it's difficult to make any guesses. Hm, but you did make a guess. >> Yeah, it doesn?t ?seem? anything but a whole lot of special pleading to bolster your rigid view that the glyphs in question can be interchangeable because of the sounds they may represent. > > I don't remember every claiming that the glyphs must be used interchangeably, only that we should carefully examine whether they are or not, and that because they represent the same sound (in a phonetic alphabet, as it is) We don?t encode sounds, we encode writing systems, the marks on paper, and in Latinate scripts (I?ll ignore CJK) we have never unified characters which are formed of historical ligatures like these? I guess ?s and ?? might possibly be the exception, but I think nobody would find a use for distinguishing them. > and are shown in the same position in alphabet tables, we shouldn't a priori exclude such a possibility. As it happens, at least one writer used the ??-with-stroke (encoded for /ju;/) for /??/, but I wouldn?t substitute the ??-with-stroke (??) for it in a diplomatic transcription. Normalized spelling is something else, but the orthography of Deseret manuscripts themselves is what it is. Subtle things like the dialect of writers can be gleaned from them, and letterforms may help to date a text. >>> - There may not be enough information to understand how the creators and early users of the script saw this issue, >> >> Um, yeah. As if there were for Phoenician, or Luwian hieroglyphs, right? > > Well, there's well over an order of magnitude difference in the time scales involved. The language that Deseret is used to write is still in active use, including in this very discussion. Quite different from Phoenician or Luwian hieroglyphs. The language is still in use, but we have no access to the minds of the dead users of Deseret unless they write about their orthographic practices explicitly. Accurate transcription can tell us if the speaker was from Boston or Britain, if for instance they regularly drop -r- in words like ?start?. > In addition, we have meta-information such as alphabet tables, which we may not have for the scripts you mention, as well as the fact that printing technology may have forced a better identification of what's a character and what not than inscriptions and other older technologies. Well, we know there was a script reform in Deseret with regard to these and some other characters. >> Nobody worried about the number of modern users of the Insular letters we encoded. Why put such a constraints on users of Deseret? ?? ?? ?? ? ?? ?? ??. > > Because it's modern users, and future users, not users some hundred years or so ago, that will use the encoding. In the case of Insular letters, my guess is that nobody wants to translate/transcribe xkcd, for example, whereas there is such a transcription for Deseret: > http://www.deseretalphabet.info/XKCD/ Modern users use the insular letterforms for accurate representation of some texts. John does the XKCD transcriptions, I believe, and he doesn?t use the diphthong letters anyway, and that?s his orthographic practice. >> Most readers and writers of Deseret today use the shapes that are in their fonts, which are those in the Unicode charts, and most texts published today don?t use the EW and OI ligatures at all, because that?s John Jenkins? editorial practice. > > So I was wrong to write "modern practitioners", and should have written "modern publishers" or "modern published texts". Or is the impression that I get from what you wrote above wrong that most texts published these days are edited by John, or by people following his practice? John is active in the area of making and publishing modern editions in Deseret. Ken has worked in the area of manuscripts and their represntation. > I don't remember denying the value of separate encodings for historic research. I only wanted to make sure that present-day use isn't inconvenienced to make historic research easier. Adding new characters won?t affect people who don?t want to use those characters in particular, though. > If the claims are correct that present-day usage is mostly a reconstruction based on the Unicode encoding and the Unicode sample glyphs, then I'm fine with helping historic research. OK, good. Those modern users who want to use ?? and ?? will still be able to do so. Those who want to use the ??-with-stroke and ??-with-stroke characters will be able to do so if they are encoded. And there are some other letters not yet encoded. >> This is exactly the same thing as the medievalist Latin abbreviation and other characters we encoded. There is neither sense nor logic nor utility in trying to argue for why editors of Deseret documents shouldn?t have the same kinds of tools that medievalists have. And as far as medievalist concerns go, many of the characters are used by relatively few researchers. Some of the characters we encoded are used all over Europe at many times. Some are used only by Nordicists, some by Celticists, and some by subsets within the Nordicist and Celticist communities. > > Maybe, maybe not. If e.g. somebody came and said that they wanted to disunify the ?s and ?z ligatures for (German) ? in order to better analyze some old manuscripts, and the modern users from hereon had to make sure they used the right one depending on the font they used, then I'm sure a lot of Germans would complain quite clearly, because it would make their current use more complicated. That?s not true, though. We have both s and ? encoded, and we gave both r and ? encoded, and the long s and r rotunda do not bother any modern user of the Latin script or force them to alter their orthography. >> Harm? What harm? Recently the UTC looked at a proposal for capital letters for ? and ?. Evidence for their existence was shown. One person on the call to the UTC said he didn?t think anyone needed them. Two of us do need them. I needed them last weekend and I had to use awkward workarounds. They weren?t accepted. There wasn?t any good rationale for the rejection. I mean, the letters exist. Case is a normal function of the script. But they weren?t accepted. For the guy who didn?t think he needed them, well, so what? If they?re encoded, he doesn?t have to use them. > > I have no idea what the reasons for this were, because I wasn't involved in the discussion. As I recall, because one person ended up agreeing ?We don?t need to encode characters for failed orthographies?. The entire Deseret script is a failed orthography of course, and that viewpoint ignores (in this case) the historical importance of Pinyin and its development. But from a functional point of view I needed capitals for those two letters (not related to early Pinyin) and had to use workarounds. That is not a satisfactory situation. >> People who use Deseret use it to for historical purposes and for cultural reasons. Everybody in Utah reads English in standard Latin orthography. > > I haven't been in Utah except for a one-time flight change in Salt Lake City more than 10 years ago. So please don't assume that everybody on this list know the state of usage for all the scripts that get discussed. OK< but https://en.wikipedia.org/wiki/Deseret_alphabet is a pretty good article. >> I didn?t ?come up? with separate historical derivations for the four characters in question. > > I didn't mean "come up" in the sense of "make up out of thin air", but in the sense of "discover". If it wasn't you but somebody else who discovered these derivations, please let us know. All it took was a look at https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg to KNOW without question the derivation of these letters, namely ??/??/??/?? with the stroke of ??. It?s blindingly obvious! :-) >>>> What Deseret has is this: >>>> >>>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE >>>> * officially named ?ew? in the code chart >>>> * used for ew in earlier texts >>>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE >>>> * officially named ?oi? in the code chart >>>> * used for oi in earlier texts >>>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE >>>> * used for oi in later texts >>>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE >>>> * used for ew in later texts >>> >>> Currently, it has this: >>> >>> 10426 ?? DESERET CAPITAL LETTER OI >>> >>> 10427 ?? DESERET CAPITAL LETTER EW >> >> You are being deliberately obtuse. Note that I stated clearly ?officially named ?ew/oi? in the code chart?. > > Well, if you think I'm deliberately obtuse, then I'd have to say that I think you're (deliberately?) obscure. I was making a point; sorry if you didn?t catch it. The names as given in that list above are the kinds of descriptions of the letters that we often give. We have LATIN LETTER THORN WITH STROKE. We might have named it LATIN LETTER THAT. > You repeat hypothetical, non-existing names They?re descriptive of the letter, not of the diphthong. > such as "DESERET CAPITAL LETTER LONG OO WITH STROKE" over and over, using capitals to make then look like the actual names, and bury the actual names (such as "DESERET CAPITAL LETTER OI") by shortening and lowercasing them. Well, I lowercased them because lowercase is used in informative notes. Anyway, sorry if my rhetoric failed to hit the mark. :-) > But even if that weren't the case, we would still want to treat it as one and the same character, with a single code point. It would still be hopelessly impractical for Germans to use two different characters, when they only can decide which character to type once they have seen the actual character in the font they type, and have to potentially change the character if they change the font. But even if we did encode an ?? letter (similar to the T-Z ligature-letter ? ? we did encode) it would be encoded for a special purpose, and wouldn?t be intended to affect standard German. Look, we can write sch?n and we can write ?cho?n and nobody?s affected by the latter. > And while we currently have no evidence that Deseret had developed a typographic tradition where some type styles would use one set of ligatures, and other styles would use another set, it wouldn't be possible to reject this possibility without actually trying to find evidence one way or another. There was type during the heyday of Deseret use, and evidence for several sorts but no typographic ?tradition? really. That?s happened latterly. >> Your argument seemed to be based solely on the use of the letters for the sounds, ignoring the historical derivation and the facts of the spelling reform in Deseret. > > The spelling reform is fine. What is important is what happened after the spelling reform. Were the 1855 variants replaced by the 1859 variants? Was it two different traditions, separated in some way or other? Or was it in effect more like a mixture of both? > (or maybe we don't know, or it's a little of everything?) Where they were replaced, it helps to identify the provenance of a text. There are also some texts where there?s a bit of a mix. In fact adding some letters to the standard for Deseret will improve users? ability to represent the historical texts. For those relatively few people who are creating new texts now, they will be able to choose what letters they need. Some, like John, don?t use the diphthong letters at all. In fact most modern readers read John?s texts, so few would probably worry about the other letters. > Examining these questions and bringing the available data to light and clarifying the limits of our data and our understanding is very important. Only in this way can we make decisions that will hopefully be valid for the rest of the existence of Unicode (which might be quite a few decades at least), or decisions that at a minimum might be evaluated as "well, they didn't know better then", rather than as "they definitely should have known better, even then?. Really, my practice when approaching this is the same as it has been for additions to Latin or Greek or Cyrillic. I?m quite consistent. :-) >> A proposal will be forthcoming. I want to thank several people who have written to me privately supporting my position with regard to this topic on this list. I can only say that supporting me in public is more useful than supporting me in private. > > I'm looking forward to your proposal. I hope it clearly indicates why (you think) there's no danger of inconveniencing modern practitioners. To be honest, we didn?t have to say ?r rotunda will not affect modern users of the Latin script?, now, did we? :-) Today I received Ken?s book on the Deseret-script English-Hopi vocabulary. This will help us move forward with a proposal. Best, Michael Everson From wjgo_10009 at btinternet.com Thu Apr 6 05:34:47 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 6 Apr 2017 11:34:47 +0100 (BST) Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406054107.20e40bd5@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> Message-ID: <3805977.19700.1491474887446.JavaMail.defaultUser@defaultHost> The following post may be of interest. http://www.unicode.org/mail-arch/unicode-ml/y2002-m06/0337.html It is part of a thread from 2002 about the possibility of chromatic fonts. I wonder if it would be possible please for Unicode to have a Chromatic property that works exactly like the emoji property in the sense of expressing that a colour version of the glyph is being requested so that characters such as the one to which Daniel refers in his post linked above can be listed in The Unicode Standard as having a variation selector for a Chromatic version as that would be a respectful terminology for characters used in such applications. I also found this post of mine. http://unicode.org/mail-arch/unicode-ml/y2002-m06/0403.html Something to think about? William Overington Thursday 6 April 2017 From wjgo_10009 at btinternet.com Thu Apr 6 06:00:36 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 6 Apr 2017 12:00:36 +0100 (BST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <19941195.63023.1491421557775.JavaMail.defaultUser@defaultHost> References: <11942252.59673.1491420041683.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <19941195.63023.1491421557775.JavaMail.defaultUser@defaultHost> Message-ID: <17871308.21901.1491476437014.JavaMail.defaultUser@defaultHost> Here is a link to a chess-type board in a garden in France shown in Google Street View. https://www.google.co.uk/maps/@47.1030089,0.3209105,3a,75y,24.39h,75.31t/data=!3m6!1e1!3m4!1sb0b73sCdjBaGofBYjXOy8Q!2e0!7i13312!8i6656 One can move around the board within Google Street View. How could we encode that? :-) I know that in reality that we probably would not, but it is an interesting thought experiment? Just for encoding fun! William Overington Thursday 6 April 2017 From mark at macchiato.com Thu Apr 6 10:05:37 2017 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 6 Apr 2017 17:05:37 +0200 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: On Thu, Apr 6, 2017 at 4:07 PM, Michael Everson wrote: > I just get frustrated when everyone including the veterans seems to forget > every bit of precedent that we have for the useful encoding of characters. > ?Nobody's forgetting anything. ?Simply because people disagree with you doesn't mean they are forgetful or stupid. One could just as well respond that you are forgetting that Unicode is *not* a glyph standard. Merely because a character have multiple shapes is not grounds for disunifying it. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Apr 6 10:32:26 2017 From: doug at ewellic.org (Doug Ewell) Date: Thu, 06 Apr 2017 08:32:26 -0700 Subject: Proposal to add standardized variation sequences for chess notation Message-ID: <20170406083226.665a7a7059d7ee80bb4d670165c8327d.029f4fd078.wbe@email03.godaddy.com> Michael Everson wrote: > Leaving out the de-facto flag of Northern Ireland wasn?t very wise > either, Nor over a thousand flags of regions that don't happen to compete independently in international sports. But anyway. -- Doug Ewell | Thornton, CO, US | ewellic.org From everson at evertype.com Thu Apr 6 11:11:01 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 17:11:01 +0100 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: On 6 Apr 2017, at 16:05, Mark Davis ?? wrote: >> I just get frustrated when everyone including the veterans seems to forget every bit of precedent that we have for the useful encoding of characters. > > ?Nobody's forgetting anything. ?Simply because people disagree with you doesn't mean they are forgetful or stupid. One could just as well respond that you are forgetting that Unicode is not a glyph standard. Merely because a character have multiple shapes is not grounds for disunifying it. The ignoring of reasonable precedent does not make the UTC seem reasonable. In terms of Deseret, the suggestion that characters ??/??/??/?? with a stroke derived from ?? are glyph variants of one another simply makes no sense at all. We have honed over many years our understanding of writing systems, and saying ?Oh, ??-with-stroke and ??-with stroke are variant shapes of the same thing?? Anyone can see that this is not true. The vexing thing is that one can never rely on consistency in the UTC?s approaches to any proposal. I have discussed this with other successful and prolific proposal writers. It?s always a coin-toss as to how a proposal will be viewed. The recent instance of adding attested capital letters for ? and ? is a perfect example. We have seen before some desire to see evidence for casing pairs (though often it has not been sought.) We have never before seen evidence for casing pairs to be thrown out. Case, of course, is a function of the Latin script, just as it is of Greek and Cyrillic and Armenian and Cherokee and both Georgian scripts and others. The UTC?s refusal to encode attested capitals for ? and ? simply makes no sense. Your statement "Merely because a character have multiple shapes is not grounds for disunifying it? suggests an underlying view that "everything is already encoded and additions are disunifications?. I do not subscribe to this view. Michael Everson From kent.karlsson14 at telia.com Thu Apr 6 11:24:48 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Thu, 06 Apr 2017 18:24:48 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <03A47EDF-0C59-4845-A8B0-23D0A9739D15@evertype.com> Message-ID: Den 2017-04-06 03:08, skrev "Michael Everson" : > On 6 Apr 2017, at 02:05, Kent Karlsson wrote: > >>> Do generic font makers intend to support both graphic terminal emulation and >>> chess? >> >> I don't know. But it should not be impossible to do so. > > And you think the proposal as it does leads to that? Yes. One in one single font (according to your current proposal), one can only have EITHER terminal emulator version, OR chess border version. Not both. Using variant selectors for the chess border variants allow for both glyph variants. Maybe it does not make much difference in a proportional font. But for a "mono-width" font the terminal emulator versions for these border characters would be "narrow", but the chess border versions should be "fullwidh"/"square" (compare CJK in terminals; double the width of, e.g., Latin characters). >>> Should chess font makers be burdened with graphic terminal emulation glyphs >>> they know nothing about? >> >> If it is really a chess font, they can just use the glyphs for the chess >> variety also as the "plain" (terminal emulator variety), and it would not >> matter (as long as no-one insist on using it for terminal emulation). > > Ha, so you?re saying it?s mostly for things like Everson Mono that it matters? > ;-) Yes (but there are other fonts than Everson Mono that are suitable for terminal emulators...). There are still people who read (plain text) emails in terminal emulators (or other email clients that cannot handle font switching inside an email, and may have selected a "terminal emulator" font for viewing emails). Though "mono-width", the chess board glyphs should be "fullwidth"... /Kent K >> All that is needed for that is a manoeuvre to copy a few glyphs within the >> font (when creating the font). I guess that is not very hard? > > It is not. > > Michael Everson From kent.karlsson14 at telia.com Thu Apr 6 11:26:39 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Thu, 06 Apr 2017 18:26:39 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <9C3A99D8-D873-41E6-8014-D163C4EF2597@evertype.com> Message-ID: Den 2017-04-06 03:05, skrev "Michael Everson" : > On 6 Apr 2017, at 01:54, Kent Karlsson wrote: > >>>> - some bidi fix [preferably making the box/border drawing characters bidi >>>> "L", if possible; otherwise a caveat that if there is an expectation to >>>> paste in such a board into an RTL document, bidi controls need be used to >>>> LTR the board]). >>> >>> I don?t know if there is a problem here and am not able to offer a solution >>> if there is. I don?t object to a solution, if there is a problem. >> >> I would think > > Come on. This is a serious proposal. I agree! ;-) > I'm glad you support it, but if you are > going to raise an issue like this, "I would think and guess about a problem" > isn't the same as "I have tried and here's an actual problem". I apologise for my slightly cautious way of expressing myself... All the characters in the "chess board lines" (apart from spaces, if any), are of bidi category ON or NSM. So there is no character that "sets" a bidi direction of the lines ("paragraphs"). So if the bidi setting for display is set to default to RTL, each of the chess board lines will be reversed in display. Now, since the border characters are not mirrored, the left and right side of the board side lines will be somewhat botched. Which is very visible in that it is ugly. (And I guess(!) the reader will notice that...) I'm not a bidi expert, but I know that much about bidi (and so should you...). > Roozbeh, there's an issue that might benefit from your expertise. Can you look > into it? Discussion needn't occur here, but offline with Kent and me, if you > prefer. > >> that anyone pasting a chess board (? la your proposal) to an RTL context will >> see that something went amiss, > > Will they? Why? Since the border characters are not mirrored (they do not have the mirroring property), the left and right side of the chess board side lines will be somewhat botched. Which is visible/ugly. Indeed, the entire chess board will be mirrored (though none of the individual glyphs), but, though visible, that whole-mirroring (line reversal) is easier to miss. >> and also know enough about bidi to set the bidi context to LTR for the chess >> board(s), > > RTL users understand the problems of cutting and pasting LTR text and symbols, > certainly. LTR users don't. > >> either by some setting, or by inserting bidi control characters. > > Well, if there's a problem it should be well-defined so it can be tackled. > >> So a small caveat is all that is necessary. Like: "The chess boards are >> assumed to be set in a left-to-right bidi context." > > THAT I can put into the document, but since chess is as important in both the > RTL and LTR worlds, it would be good to know what's what. See above. /Kent K > Thank you again for your thoughtfulness, > > Michael From beckiergb at gmail.com Thu Apr 6 11:28:04 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Thu, 6 Apr 2017 09:28:04 -0700 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: The Teletext set of 2x3 block characters also covers a significant chunk of the TRS-80 and CoCo character sets: http://www.kreativekorp.com/software/fonts/trs80.shtml I have been thinking of proposing those characters for a while, actually, and that would have been my next proposal after PETSCII. :) The question is, do we want to add these missing graphics characters incrementally, platform by platform, or put together a larger proposal for, say, one big Block Elements Extended block? My first thought is that an incremental approach would make it easier to get characters into the standard, but what do I know. -- Rebecca Bettencourt On Thu, Apr 6, 2017 at 5:07 AM, Rebecca T <637275 at gmail.com> wrote: > Here?s a copy of the Teletext character set; it includes box-drawing > characters > for all combinations of a 2?3 grid of cells. 2? = 64 characters, so we > might > need a new block. > > [1]: http://www.galax.xyz/TELETEXT/CHARSET.HTM > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Thu Apr 6 11:29:08 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 17:29:08 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: On 6 Apr 2017, at 17:24, Kent Karlsson wrote: > One in one single font (according to your current proposal), one can only have EITHER terminal emulator version, OR chess border version. Not both. Using variant selectors for the chess border variants allow for both glyph variants. Maybe it does not make much difference in a proportional font. But for a "mono-width" font the terminal emulator versions for these border characters would be "narrow", but the chess border versions should be "fullwidh"/"square" (compare CJK in terminals; double the width of, e.g., Latin characters). Hm. Time for me to put VS support into Everson Mono, than, and see what happens. But I think you?re probably right, though. Tak for hj?lpet. Michael Everson From beckiergb at gmail.com Thu Apr 6 11:36:36 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Thu, 6 Apr 2017 09:36:36 -0700 Subject: PETSCII mapping? In-Reply-To: <30B5FF67-D4A4-499F-9447-E56FBE7C6117@evertype.com> References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> <30B5FF67-D4A4-499F-9447-E56FBE7C6117@evertype.com> Message-ID: On Thu, Apr 6, 2017 at 5:25 AM, Michael Everson wrote: > At some point this should be taken off the main list since discussion will > get very detailed very quickly. > I agree. How should we get all the interested parties together? -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Thu Apr 6 11:39:01 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 17:39:01 +0100 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> <30B5FF67-D4A4-499F-9447-E56FBE7C6117@evertype.com> Message-ID: <36866524-2CF1-4FE3-B609-FA4B9EF08968@evertype.com> On 6 Apr 2017, at 17:36, Rebecca Bettencourt wrote: > > At some point this should be taken off the main list since discussion will get very detailed very quickly. > > I agree. How should we get all the interested parties together? Everybody interested, raise your hand? Michael Everson From mark at macchiato.com Thu Apr 6 11:45:07 2017 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Thu, 6 Apr 2017 18:45:07 +0200 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: Mark On Thu, Apr 6, 2017 at 6:11 PM, Michael Everson wrote: > On 6 Apr 2017, at 16:05, Mark Davis ?? wrote: > > >> I just get frustrated when everyone including the veterans seems to > forget every bit of precedent that we have for the useful encoding of > characters. > > > > ?Nobody's forgetting anything. ?Simply because people disagree with you > doesn't mean they are forgetful or stupid. One could just as well respond > that you are forgetting that Unicode is not a glyph standard. Merely > because a character have multiple shapes is not grounds for disunifying it. > > The ignoring of reasonable precedent does not make the UTC seem > reasonable. In terms of Deseret, the suggestion that characters ??/??/??/?? > with a stroke derived from ?? are glyph variants of one another simply > makes no sense at all. We have honed over many years our understanding of > writing systems, and saying ?Oh, ??-with-stroke and ??-with stroke are > variant shapes of the same thing?? Anyone can see that this is not true. > ?"Anyone" doesn't matter. What matters is users of Deseret, not you, not me. If knowledgeable users of Deseret recognize two shapes as representing the same character, that is what matters. Similarly, users of Fraktur will recognize that *very* different shapes represent the same Latin character, while some very similar (to other's eyes) shapes represent different characters (some of the capitals, for example). ? > > The vexing thing is that one can never rely on consistency in the UTC?s > approaches to any proposal. I have discussed this with other successful and > prolific proposal writers. It?s always a coin-toss as to how a proposal > will be viewed. > > The recent instance of adding attested capital letters for ? and ? is a > perfect example. We have seen before some desire to see evidence for casing > pairs (though often it has not been sought.) We have never before seen > evidence for casing pairs to be thrown out. Case, of course, is a function > of the Latin script, just as it is of Greek and Cyrillic and Armenian and > Cherokee and both Georgian scripts and others. The UTC?s refusal to encode > attested capitals for ? and ? simply makes no sense. > ?To you. ? > > Your statement "Merely because a character have multiple shapes is not > grounds for disunifying it? suggests an underlying view that "everything is > already encoded and additions are disunifications?. ?No, not at all. That is a false dichotomy. > I do not subscribe to this view. > ? > > Michael Everson -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Thu Apr 6 12:27:40 2017 From: everson at evertype.com (Michael Everson) Date: Thu, 6 Apr 2017 18:27:40 +0100 Subject: Standaridized variation sequences for the Desert alphabet? In-Reply-To: References: <20170321104104.665a7a7059d7ee80bb4d670165c8327d.c6e0d0ee2d.wbe@email03.godaddy.com> <84F24B3C-9884-432C-B71F-B8D9D283DE9B@evertype.com> <24975108-52a4-cda4-737d-6a41ff1b5c14@it.aoyama.ac.jp> <17D37ACB-9269-4537-AE60-71BB6CA42366@evertype.com> <7686cee6-1b4f-d1a6-8cd7-09859757465c@it.aoyama.ac.jp> <587FFDFA-CAAE-4F81-B60D-94EB9C550151@evertype.com> <2f05d26e-9d3f-4670-f667-1daf1cd53063@it.aoyama.ac.jp> <6C843948-F554-4C52-B103-36508595C4FB@evertype.com> Message-ID: <17849B3C-0BB7-4B64-B942-07001A0DBD62@evertype.com> On 6 Apr 2017, at 17:45, Mark Davis ?? wrote: >> We have honed over many years our understanding of writing systems, and saying ?Oh, ??-with-stroke and ??-with stroke are variant shapes of the same thing?? Anyone can see that this is not true. > > ?"Anyone" doesn't matter. What matters is users of Deseret, not you, not me. Firstly, I am a user of Deseret. I have designed Deseret fonts and typeset books and published them. Secondly, professional script encoders like me are the ones who give advice and counsel to people who come to us with encoding needs. > If knowledgeable users of Deseret recognize two shapes as representing the same character, that is what matters. Representing the same sound is not the same thing as representing the same character. > Similarly, users of Fraktur will recognize that very different shapes represent the same Latin character, while some very similar (to other's eyes) shapes represent different characters (some of the capitals, for example). Whole-script identity of Roman, Gaelic, and Fraktur is a different kind of identity than the identification of letterforms based on ??/??/??/?? with the stroke of ??. >> ?The recent instance of adding attested capital letters for ? and ? is a perfect example. We have seen before some desire to see evidence for casing pairs (though often it has not been sought.) We have never before seen evidence for casing pairs to be thrown out. Case, of course, is a function of the Latin script, just as it is of Greek and Cyrillic and Armenian and Cherokee and both Georgian scripts and others. The UTC?s refusal to encode attested capitals for ? and ? simply makes no sense. > > ?To you. I am not answered by such an abrupt, dismissive response. Please explain how the inconsistency makes sense. >> Your statement "Merely because a character have multiple shapes is not grounds for disunifying it? suggests an underlying view that "everything is already encoded and additions are disunifications?. > > ?No, not at all. That is a false dichotomy. Well, you used the word ?disunify?. To me, that means you assume that if a character which can be used for the diphthong /ju?/ has been encoded, that when another, different one is found, with a different derivation, then the second is automatically pre-judged to be unified with the first and must be disunified from it. That does not make sense, because we encode writing systems, not sounds. My view on this has been consistent since I first embarked on this work. Michael Everson From verdy_p at wanadoo.fr Thu Apr 6 12:43:46 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Thu, 6 Apr 2017 19:43:46 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <85BB6AAE-8319-446C-9C42-F6535CF043D4@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <330428926.45676.1491472836694.JavaMail.open-xchange@app06.ox.hosteurope.de> <85BB6AAE-8319-446C-9C42-F6535CF043D4@evertype.com> Message-ID: 2017-04-06 14:57 GMT+02:00 Michael Everson : > On 6 Apr 2017, at 11:00, Christoph P?per > wrote: > > > > Michael Everson : > >> > >> Standardized variation sequences are the best way to achieve this > simply and without needless duplication. :-) > > > > I still agree with this assertion. > > So do I.. ;-) > > >> Yes but you still want it to be reasonably legible when the OpenType > ligatures fail. > > > > This is were I don't follow. > > Why wouldn?t you want it to be reasonably legible when the OpenType > ligatures can?t be displayed? > > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????? > is far better than this: > ?????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ?????????????????? > ??????????????????<< Is it the pawn or the queen that?s on the black > square? > ?????????????????? > ?????????? > > > It *looks* far better in a multi-line plain text environment, but that's > a glyphic/typographic/stylistic argument. > > It?s an argument for legibility. > And an argument for rendering purpose only; the actual 2D layout of chess diagrams is not part of Unicode and does not have to be encoded. Unicode is not a glyph encoding standard. I still think this is a hack, similar to ASCII art and legacy emojis made of ASCII punctuations like :-) or more complex pseudo-emojis using multiple rows (that do not render correctly when they depend on specific font designs and metrics.) I am still convinced that it does not matter if a legacy rendering will not show white vs. black cells because characte"rs are not rendered in a monospaced font. The argument exposed for checkered boards here would not apply for many other boards that typically don't have checkered layouts (including for example for playing shogi or go). If we want to add something to represent board cells/tiles in addition to pieces, that encoding should be coherent and not choosing randomly some characters that were not even designed to align with similar metrics (such as ?? and ?? here!) and not really intended to represent (optionally colored) cells in a grid. As well this will not work with other layouts (including shogi that has variants where cells are triangular: you cannot reliably represent them using rows filled with ????. These characters have implicit internal leading and trailing bearings both horizontally and vertically and cannot have metrics correctly set without breaking other notations that would depend on these bearings, for example in mathematic formulas where they are separated symbols). So you cannot expect rows in rectangular grid patterns made with ?? and ?? to look correct... unless they are each one modified with a variant selector saying they should use the full character cell (and there will still be problems with ?? because they will actually need to cover more than their rectangular cells with twho corners extending outside of it with additional kerning, not suitable for mathematics). And the poroblem with such grid patterns is more generic than just chess diagrams. We should be able to represent directly at least several well known patterns of cells/tiles (optionally colored when this matters), and then be able to combine them with any chacter/cluster inside them (for example for classic crosswords, Scrabble, triominos and similar games). We need a way to represent grids made with square/rectangular cells, or triangular/hexagonal cells (for triangular and hexagonal cells we need additional half-cells to properly align rows at least at start or rows, and hexagonal cells will partly extend over the previous and next row So I would prefer a proposal to: * add specific symbol characters for these common patterns of cells (rectangular/square, triangular, hexagonal), plus half-cells for use at start and end of rows (if rows are not aligned vertically but in create triangular layouts), * optionally followed by some variant selectors for mapping some semantic colors on them (semantic color means "light" and "dark" may be "white" and "black, or "ivory" and "wood", or "yellow" and "red", or "empty/transparent" vs. "hatched" with monochromatic rendering where colors are replaced by fill patterns such as ///, or dots with some density; we should have about 8 semantic colors, representable with actual colors or grey or fill patterns). The common "black square" and "white square" (the white version would be the default semantic color and would not need any additional variant). * and then use ZWJ to combine them with letters/symbols to be centered within them (possibly some extended clusters such as letters+combining subscript digits in Scrabble) The base set of the first set for cells should be based on old wellknown "block graphic" characters used in various legacy computers or teletext/videotex (where characters were all monospaced, so they would line up properly). But Unicode still lacks many usefull characters for this purpose (the only exception was for those from IBM PC in "line-drawing" from codepage 437, but the set has been left largely incomplete and these characters are still not suitable for all we need (the lines are all passing through the center of cells, there's no support for horizontal/vertical lines bordering cells, and nothing for diagonals) The only new thing that did not exist in legacy charset was the possibility of combining cells and symbols within them (but these legacy displays could use background colors for representing cells) A set of suitable symbols for use with grids in various games is still highly wanted, independantly of pieces/symbols that will be rendered in them. For now only gobans are partly supported (using code 437 line drawing characters for empty cells, and the encoded white and black stones when they occupy a cell position, but there's no way to indicate they should arrange in grids with suitable metrics) -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Thu Apr 6 12:43:45 2017 From: doug at ewellic.org (Doug Ewell) Date: Thu, 06 Apr 2017 10:43:45 -0700 Subject: PETSCII =?UTF-8?Q?mapping=3F?= Message-ID: <20170406104345.665a7a7059d7ee80bb4d670165c8327d.948d96dc25.wbe@email03.godaddy.com> Michael Everson wrote: > Everybody interested, raise your hand? I'm in. Rebecca Bettencourt wrote: > The question is, do we want to add these missing graphics characters > incrementally, platform by platform, or put together a larger proposal > for, say, one big Block Elements Extended block? I would guess the latter. There's no tremendous rush; there should be time to do a proper analysis of target platforms, evaluate which proposed characters should be unified with existing or other proposed characters, and so forth. Of course there's no guarantee this will be the last request ever for 8-bit computer compatibility characters, but there doesn't seem much point in intentionally dragging the process out, platform by platform. -- Doug Ewell | Thornton, CO, US | ewellic.org From maccampus at t-online.de Thu Apr 6 11:04:31 2017 From: maccampus at t-online.de (MacCampus) Date: Thu, 6 Apr 2017 18:04:31 +0200 Subject: Eszett variation sequence In-Reply-To: References: <1564892666.46890.1491482226024.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <04D68991-5E3B-4281-AF75-27886BEED3DF@t-online.de> Actually, the Berlin street signs are well-known cases of using the alternate form of the German sharp s. I personally have never seen a straight y in German usage anywhere else. For me, both cases can sufficiently being taken care of using OpenType features or simply a dedicated font, as is the case with the lettering in Berlin. The German Wikipedia article on the ??? (https://de.wikipedia.org/wiki/?) names the author of the font (Herbert Thannh?user); in the English version on the letter, this information is missing. The article dedicated to Herbert Thannh?user personally (https://de.wikipedia.org/wiki/Herbert_Thannhaeuser; German Wikipedia only) makes it clear that the font used in Berlin was especially commissioned from him, so it was probably more a one-off design. Am 06.04.2017 um 15:26 schrieb Michael Everson : > > http://evertype.com/standards/unicode-list/seydlitzstr.jpg > > Do you think we should encode a Latin straight y (like the Cyrillic one) so we can write Se?dlitzstra??e? > >> >> Would it make sense to propose standardized variation sequences for these styles or should this be left to font features like `cv##` or `calt` in Opentype? > ++++++++++++++++ Sebastian Kempgen MacCampus? Germany From richard.wordingham at ntlworld.com Thu Apr 6 13:21:43 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 6 Apr 2017 19:21:43 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: <3805977.19700.1491474887446.JavaMail.defaultUser@defaultHost> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> <3805977.19700.1491474887446.JavaMail.defaultUser@defaultHost> Message-ID: <20170406192143.257cb9f6@JRWUBU2> On Thu, 6 Apr 2017 11:34:47 +0100 (BST) William_J_G Overington wrote: > The following post may be of interest. > > http://www.unicode.org/mail-arch/unicode-ml/y2002-m06/0337.html > > It is part of a thread from 2002 about the possibility of chromatic > fonts. > > I wonder if it would be possible please for Unicode to have a > Chromatic property that works exactly like the emoji property in the > sense of expressing that a colour version of the glyph is being > requested so that characters such as the one to which Daniel refers > in his post linked above can be listed in The Unicode Standard as > having a variation selector for a Chromatic version as that would be > a respectful terminology for characters used in such applications. Well, who thinks Khaled's example (http://www.amirifont.org/fatiha-colored.html) is a "text presentation" and who thinks it is an "emoji presentation"? I think it's a text presentation. If "text presentations" have to be monochrome, as Asmus claims, I think the UTC ought to effectively change the emoji property from a binary property to an enumerated property with values like "monochrome", "multicolour", and "emoji". There might be technical problems, but I suspect the the emoji property is be covered by Unicode stability guarantees. The property is in no way part of the Unicode standard. However, it is probably something best left to a higher order protocol - as may be done for some emoji. Individual requests would have to be justified on a character by character basis. Richard. From beckiergb at gmail.com Thu Apr 6 13:24:26 2017 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Thu, 6 Apr 2017 11:24:26 -0700 Subject: PETSCII mapping? In-Reply-To: <20170406104345.665a7a7059d7ee80bb4d670165c8327d.948d96dc25.wbe@email03.godaddy.com> References: <20170406104345.665a7a7059d7ee80bb4d670165c8327d.948d96dc25.wbe@email03.godaddy.com> Message-ID: On Thu, Apr 6, 2017 at 10:43 AM, Doug Ewell wrote: > Michael Everson wrote: > > > Everybody interested, raise your hand? > > I'm in. I'm in as well of course. > Rebecca Bettencourt wrote: > > > The question is, do we want to add these missing graphics characters > > incrementally, platform by platform, or put together a larger proposal > > for, say, one big Block Elements Extended block? > > I would guess the latter. There's no tremendous rush; there should be > time to do a proper analysis of target platforms, evaluate which > proposed characters should be unified with existing or other proposed > characters, and so forth. > > Of course there's no guarantee this will be the last request ever for > 8-bit computer compatibility characters, but there doesn't seem much > point in intentionally dragging the process out, platform by platform. > You make a good point. I'm in either way. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Thu Apr 6 13:37:24 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 6 Apr 2017 19:37:24 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> Message-ID: <20170406193724.2f520aa0@JRWUBU2> On Thu, 6 Apr 2017 01:19:42 -0400 Rebecca T <637275 at gmail.com> wrote: > ... and > aside from usage I see > no difference between U+1F989 OWL ?? and U+13153 EGYPTIAN HIEROGLYPH > G017 ??. OWL does not have a prescribed attitude. On the other hand, if G017 were not body side on and head face on, I am not sure it would be readable. Additionally, which way the body is oriented is generally important. Richard. From Shawn.Steele at microsoft.com Thu Apr 6 13:48:36 2017 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Thu, 6 Apr 2017 18:48:36 +0000 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: IIRC there was a little bit of difference between computers... A quick Bing search found this, dunno if it's helpful: http://www.aivosto.com/vbtips/petscii.pdf And, of course, often graphics were done by customizing the character set, however those are likely unique and not appropriate for encoding. -Shawn -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Alastair Houghton Sent: Thursday, April 6, 2017 2:10 AM To: Elias M?rtenson ; Rebecca Bettencourt Cc: James Kass ; 637275 at gmail.com; unicode ; Asmus Freytag (c) ; Charlotte Buff Subject: Re: PETSCII mapping? On 6 Apr 2017, at 08:25, Elias M?rtenson wrote: > > Wouldn't it make sense to get in touch with active Commodore 64 communities to find out how people deal with this today? I'm sure there are use cases that none of us have thought about. Since most of the issue is graphics characters, and since that same problem affects PETSCII, ATASCII, the ZX80 set, and Teletext/Videotex/Viewdata (aka BBC Micro mode 7), would it be better to come up with a complete set of extra graphic characters that need to be encoded, and make it a proposal to ?complete the set? of box drawing and graphics characters? IMO the Teletext set is *much* more important than PETSCII or ATASCII; while there will very likely be text encoded in the latter two, there are significant volumes encoded in the Teletext set. Quite a bit of data has already been lost (there are mirrors of old Prestel/Viewdata BBS systems, some of which have sadly lost all the graphics because of the lack of equivalent Unicode characters), and a lot of the rest is either encoded in a non-standard encoding or held as screen shots. Also, it would be worth looking to see if there are any discussions from past attempts to get any of these things into the Unicode standard; I can?t imagine this is the first time anyone?s asked for more graphics characters. Kind regards, Alastair. -- http://alastairs-place.net From wjgo_10009 at btinternet.com Thu Apr 6 13:39:51 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 6 Apr 2017 19:39:51 +0100 (BST) Subject: Coloured Punctuation and Annotation Message-ID: <4202283.53811.1491503991391.JavaMail.defaultUser@defaultHost> Michael Everson wrote: > No. Here is an example of a font available in two variants. In one variant, all those grey swirls are fused to the letters, and it can all be printed in black or one colour ink. > http://cdn.myfonts.net/s/aw/original/255/0/131020.png > There is also a second set of fonts included which separates the swirls from the letters, and those can be used in typesetting to get the two-colour effect you see here. That can?t really be done using standard encoding. You?d probably see IIVVOORRYY in the backing store for that word, with every other letter being set in the letter font and the swirl font. Richard Wordingham mentioned the following. > The third glyph would use 'index' 0xFFFF to specify that it be displayed in the foreground colour. If the OpenType specification were augmented so that 'index' 0xFFFE were to specify that the appropriate part of the glyph be displayed in the "first decoration colour", a colour specified in the application program and not in the font; and an application program were augmented so that an end user were able to choose first decoration colour as well as choosing foreground colour, then would that produce the result for which Michael is looking? William Overington Thursday 6 April 2017 From asmusf at ix.netcom.com Thu Apr 6 15:17:36 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 6 Apr 2017 13:17:36 -0700 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170406192143.257cb9f6@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> <3805977.19700.1491474887446.JavaMail.defaultUser@defaultHost> <20170406192143.257cb9f6@JRWUBU2> Message-ID: An HTML attachment was scrubbed... URL: From 637275 at gmail.com Thu Apr 6 15:37:49 2017 From: 637275 at gmail.com (Rebecca T) Date: Thu, 6 Apr 2017 16:37:49 -0400 Subject: PETSCII mapping? In-Reply-To: References: <20170406104345.665a7a7059d7ee80bb4d670165c8327d.948d96dc25.wbe@email03.godaddy.com> Message-ID: Count me in! I?m partial for one large unified proposal, FWIW. On Thu, Apr 6, 2017 at 2:24 PM, Rebecca Bettencourt wrote: > On Thu, Apr 6, 2017 at 10:43 AM, Doug Ewell wrote: > >> Michael Everson wrote: >> >> > Everybody interested, raise your hand? >> >> I'm in. > > > I'm in as well of course. > > >> Rebecca Bettencourt wrote: >> >> > The question is, do we want to add these missing graphics characters >> > incrementally, platform by platform, or put together a larger proposal >> > for, say, one big Block Elements Extended block? >> >> I would guess the latter. There's no tremendous rush; there should be >> time to do a proper analysis of target platforms, evaluate which >> proposed characters should be unified with existing or other proposed >> characters, and so forth. >> >> Of course there's no guarantee this will be the last request ever for >> 8-bit computer compatibility characters, but there doesn't seem much >> point in intentionally dragging the process out, platform by platform. >> > > You make a good point. I'm in either way. :) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at kli.org Thu Apr 6 19:54:48 2017 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 6 Apr 2017 20:54:48 -0400 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: On 04/05/2017 05:25 PM, Rebecca T wrote: > > As time goes on, ?not in widespread use? will become a flimsier and > flimsier > argument against inclusion Indeed. This is the chicken-and-egg problem, and you are not the first to (rightly) point it out as a flimsy excuse. Thanks for bringing it up again, though: people still seem to go back to it a lot. ~mark From mark at kli.org Thu Apr 6 20:19:32 2017 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 6 Apr 2017 21:19:32 -0400 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: <7d6aa08e-7372-fa95-9532-0ce4cc7a9b45@kli.org> On 04/06/2017 08:07 AM, Rebecca T wrote: > Here?s a copy of the Teletext character set; it includes box-drawing > characters > for all combinations of a 2?3 grid of cells. 2? = 64 characters, so we > might > need a new block. > > [1]: http://www.galax.xyz/TELETEXT/CHARSET.HTM > My old TRS-80 also did "graphics" like this, with 64 2?3 cells. That was even how it did it when you were setting individual blocks. The smallest "pixel" you could control in graphics was one of these ?ths of a character cell, and wouldn't you know it? As soon as you set one in a cell occupied by some other character, the other character would disappear. Not positive these count as plain text, but there's a decent argument for it. ~mark From verdy_p at wanadoo.fr Thu Apr 6 21:53:39 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 7 Apr 2017 04:53:39 +0200 Subject: PETSCII mapping? In-Reply-To: <7d6aa08e-7372-fa95-9532-0ce4cc7a9b45@kli.org> References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> <7d6aa08e-7372-fa95-9532-0ce4cc7a9b45@kli.org> Message-ID: This 2x3 block graphic set was also part of Videotex/Teletext/Antiope standards in Europe (used on PCs, dedicated terminals, and TV programs, and still supported in more recent teletext technologies, even if many smart TVs offer other interactive protocols based on web standards, or possibly embedding an HTML/CSS/Javascript rendering engine, sometimes even with Android SDK support for applications). It has even been implemented in some TV networks in US. Before graphic displays became widespread (when the EGA standard started being added, then when non-monochromatic monitors appeared almost immediately after it), almost all text terminals had such minimal support for such "mosaic" graphics. Only the original IBM PC had a much more limited set, using 1x2 blocks, while using box-drawing graphic subset in their legacy codepages. The original IBM logo was made of these 1x2 blocks 2017-04-07 3:19 GMT+02:00 Mark E. Shoulson : > On 04/06/2017 08:07 AM, Rebecca T wrote: > >> Here?s a copy of the Teletext character set; it includes box-drawing >> characters >> for all combinations of a 2?3 grid of cells. 2? = 64 characters, so we >> might >> need a new block. >> >> [1]: http://www.galax.xyz/TELETEXT/CHARSET.HTM >> >> My old TRS-80 also did "graphics" like this, with 64 2?3 cells. That was > even how it did it when you were setting individual blocks. The smallest > "pixel" you could control in graphics was one of these ?ths of a character > cell, and wouldn't you know it? As soon as you set one in a cell occupied > by some other character, the other character would disappear. > > Not positive these count as plain text, but there's a decent argument for > it. > > ~mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwalla at gmail.com Thu Apr 6 23:52:05 2017 From: gwalla at gmail.com (Garth Wallace) Date: Thu, 6 Apr 2017 21:52:05 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: On Thu, Apr 6, 2017 at 5:19 AM, Christoph P?per wrote: > Mark Davis ?? : > > > > I'm looking forward to similar postings on checkers and go pieces. (...) > > And I'm looking also forward to the ?+ZWJ+?? (etc) proposal. > > Well, actually ... > > Garth Wallace made an important observation in > : > > >> Currently, chess fonts can be (roughly) divided into "diagram fonts" and > >> "notation fonts". > > The major goal of Michael Everson's proposal to introduce standardized > sequences > with variant selectors 1 and 2 (U+FE00/1) for chess piece characters > (primarily > U+2654-F), as far as I understand it, is to assure "diagram" glyph design. > This > means fixed-width figurines centered in the character cell, with means for > square color and board border elements (incl. labels), whereas "notation" > style > usually has proportional figurines sitting on the baseline. > > This is all correct. > The square color is ignored in standard chess notation, where fields are > conventionally known by their alphabetic column index ("file", A-H for a > standard checkerboard) and numeric row index ("rank", 1-8), i.e. A1 > through H8, > which are virtually never styled as "?A1", "?B1", "B2" > whereas figurine charactres may either augment or substitute conventional > letter > symbols. > Usually substitute, but yes. > > - Black/dark squares are those whose file and rank are either both odd > or both > even. > - White/light squares are those whose file is odd and rank is even, or > vice > versa. > > Corollary: The glyph background is almost only important within diagram > notation. > Yes, this is acknowledged in the proposal. > > Diagrams may only show select squares, so the color of the first or last > one and > hence intermediate ones cannot necessarily be deduced from the immediate > context. (They may be implied by row and column labels, which is simple > for a > sighted human reader, but complex for computers and blind readers.) > Also, I believe that existing fonts and text rendering systems cannot tell what is above or below the current line, so there is no way to determine what the background should be based on rows. Although Michael Everson readily dismisses any connection to emojis, e.g. > L2/16-021 or L2/16-087+088, and hence the Emoji and Emoji_Presentation > character > properties as well as sequences with variation selectors 15 and 16 > (U+FE0E/F), > normal emoji design actually matches "diagram" notation quite nicely in > that all > emoji glyphs are rendered within an (ideographic / em) square. Black and > white > squares are also already available as emojis in small U+25AA/B, > medium-small > U+25FE/D, medium U+25FC/B and large U+2B1B/C. The last ones would probably > be > preferred. Only the first ones are default text style characters. The > characters > for empty squares from the proposal, U+25A8/1, have no emoji > representation yet. > I've suggested to use the hatching characters U+25Ax for their colors as in > heraldic tinctures, which relate U+25A8 to Purple ("purpure"). > The only connection this has with emoji is that it uses the variation selector system. Emoji is a means of indicating full-color embedded images, and usually rely on rendering or interpreting systems that substitute graphics, or various non-standardized font extentions. None of that is necessary, orrelevant, to chess diagrams. I don't believe emoji are even necessarily fixed-width. That's incidental (and I think some implementations of the flags are wider than 1em). > Without the need for ZWJ sequences, Opentype fonts can employ their > Contextual > Alternates `calt` feature to select the correct background color in diagram > notation: In a sequence of up to eight chess pieces without an empty > square with > explicit color, an initial U+2656-FE0F White Rook, U+2654-FE0F White King, > U+265B-FE0F Black Queen or U+265F-FE0F Black Pawn would default to a black > background, U+2659-FE0F White Pawn, U+2655 White Queen, U+265A-FE0F Black > King > or U+265C-FE0F Black Rook to a white background. Other than that, each > character > uses the alternate glyph with opposing background color from its preceding > (left-side) glyph. The empty squares work as explicit anchors. > I don't see how this would work. Any row with a white rook on the first file would start with a black background? What? -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Fri Apr 7 02:01:15 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Fri, 7 Apr 2017 09:01:15 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <9C3A99D8-D873-41E6-8014-D163C4EF2597@evertype.com> Message-ID: 2017-04-06 18:26 GMT+02:00 Kent Karlsson : > > Den 2017-04-06 03:05, skrev "Michael Everson" : > > > On 6 Apr 2017, at 01:54, Kent Karlsson > wrote: > > > >>>> - some bidi fix [preferably making the box/border drawing characters > bidi > >>>> "L", if possible; otherwise a caveat that if there is an expectation > to > >>>> paste in such a board into an RTL document, bidi controls need be > used to > >>>> LTR the board]). > >>> > >>> I don?t know if there is a problem here and am not able to offer a > solution > >>> if there is. I don?t object to a solution, if there is a problem. > >> > >> I would think > > > > Come on. This is a serious proposal. > > Mojibake is back on this list ! Some people on this list still use old mail agents not conforming to Unicode with legacy 8-bit charsets incorrectly mixed... -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Apr 7 05:01:24 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 7 Apr 2017 11:01:24 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> <3805977.19700.1491474887446.JavaMail.defaultUser@defaultHost> <20170406192143.257cb9f6@JRWUBU2> Message-ID: <20170407110124.0ea3606e@JRWUBU2> On Thu, 6 Apr 2017 13:17:36 -0700 Asmus Freytag wrote: > While it appears possible, after Khaled's demonstration, I still > think that the use of "white ink" instead of the "white" parts of a > character being treated "transparent" is far from standard text > presentation. (And I've yet to see an example that's motivated by > anything other than emoji). I think multicoloured fonts for plain text are in their infancy. However, we now need to be on guard against the natural conflation of 'white' and 'transparent'. UTS#51 has a good paragraph on the topic in the 'Design Guidelines' section: "Names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of ?black? and ?white? in the names is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. Similarly, in other symbols such as the hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTING INDEX, the words ?white? and ?black? also refer to outlined versus filled, and do not indicate skin color." I think it would be worth making those points in the Unicode Standard - I suggest the section on 'Geometric Shapes', which is Section 22.8 in TUS 9.0.0. Of course, if U+25A1 WHITE SQUARE is the outline of a square, it then seems odd that a valid presentation form should be just a spacing glyph, as seems to be preferred for chess boards! I suppose this could be considered an edge case :-) Richard. From everson at evertype.com Fri Apr 7 07:06:28 2017 From: everson at evertype.com (Michael Everson) Date: Fri, 7 Apr 2017 13:06:28 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: <20170407110124.0ea3606e@JRWUBU2> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> <3805977.19700.1491474887446.JavaMail.defaultUser@defaultHost> <20170406192143.257cb9f6@JRWUBU2> <20170407110124.0ea3606e@JRWUBU2> Message-ID: <63D06BCF-2869-451A-86B1-60AE2513B425@evertype.com> On 7 Apr 2017, at 11:01, Richard Wordingham wrote: > > Of course, if U+25A1 WHITE SQUARE is the outline of a square, it then seems odd that a valid presentation form should be just a spacing glyph, as seems to be preferred for chess boards! I suppose this could be considered an edge case :-) Using SP or NBSP would not be a good idea. Spaces separate things and have complex properties. The light and dark squares on a chessboard are squares, not one square and one nirv??ic emptiness. Yes, the VS applied to WHITE SQUARE makes it em-square sized and removes the outline, but that?s a specific glyph for a specific purpose. Michael Everson From wjgo_10009 at btinternet.com Fri Apr 7 02:56:31 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 7 Apr 2017 08:56:31 +0100 (BST) Subject: PETSCII mapping? Message-ID: <8051583.5970.1491551791825.JavaMail.defaultUser@defaultHost> > At some point this should be taken off the main list since discussion will get very detailed very quickly. > I agree. How should we get all the interested parties together? > Everybody interested, raise your hand Yes please. William From christoph.paeper at crissov.de Fri Apr 7 17:17:10 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Sat, 8 Apr 2017 00:17:10 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> Garth Wallace : > On Thu, Apr 6, 2017 at 5:19 AM, Christoph P?per > wrote: > > > Although Michael Everson readily dismisses any connection to emojis, (...) > > normal emoji design actually matches "diagram" notation quite nicely in > > that all > > emoji glyphs are rendered within an (ideographic / em) square. Black and > > white > > squares are also already available as emojis (...) > > The only connection this has with emoji is that it uses the variation selector > system. As I've shown, that's not the *only* connection. > Emoji is a means of indicating full-color embedded images, That may be how they are often conceived. It's not quite true, though. The first sentence of UTS#51 reads: Emoji are pictographs (pictorial symbols) that are typically presented in a colorful cartoon form and used inline in text. That sounds similar, but is different in an important detail: the scope, active vs. passive. Emojis do not reference images, but systems present them as images. > ... and usually rely on rendering or interpreting systems that substitute > graphics, Such systems are widespread, but emojis don't rely on them since they are true characters. Symbol fonts like Font Awesome, for instance, are usually PUA-only, although they should be using codepoints of existing emojis and other symbols as much as possible for fallback and interoperability reasons. They don't because they'd risk that some of their consistent, monochrome glyphs would be replaced with a colorful picture by an overly aggressive system. > or various non-standardized font extentions. Opentype 1.8 standardizes as many as four approaches to colorful glyphs. > None of that is necessary, or relevant, to chess diagrams. Chess diagrams (unlike chess notation) are often rendered as graphics, not text. Board and glyphs may have fancy designs and colors, e.g. wooden fields. You would get that for free with emoji presentation, but fonts would not have to use more than two colors. Note that "white" chess pieces should usually not be rendered as outlines only, i.e. with transparent eyes, but with a solid whitish interior. Almost all other pre-emoji characters with "White" in their standard name are hollow. "Black" means always 'solid', but the actual color can be any (including white), although text is still mostly typeset in a blackish color. > I don't believe emoji are even necessarily fixed-width. In all existing implementations they are. They are even always square. I'm not sure whether their em square always matches the sinographic ("ideographic") square, but it seems as if it usually does. > > Without the need for ZWJ sequences, Opentype fonts can employ their > > Contextual > > Alternates `calt` feature to select the correct background color in diagram > > notation: In a sequence of up to eight chess pieces without an empty square > > with > > explicit color, an initial U+2656-FE0F White Rook, U+2654-FE0F White King, > > U+265B-FE0F Black Queen or U+265F-FE0F Black Pawn would default to a black > > background, U+2659-FE0F White Pawn, U+2655 White Queen, U+265A-FE0F Black > > King > > or U+265C-FE0F Black Rook to a white background. Other than that, each > > character > > uses the alternate glyph with opposing background color from its preceding > > (left-side) glyph. The empty squares work as explicit anchors. > > I don't see how this would work. Any row with a white rook on the first file > would start with a black background? What? Usually empty squares would determine the backgrounds of adjacent pieces, but if you have a line of adjacent chess pieces (up to 8 by standard rules) without any empty square (e.g. before the first move), there would need to be some kind of heuristic. I chose the colors of pieces that do not come in pairs (i.e. King and Queen) and of the left-most figurines (Rook and Pawn) for a board with white at the bottom, because that is the most common orientation. If you want, I could write and post the code in Adobe OT feature file notation required for `calt` to demonstrate that this would yield results as expected for all full-size 8*8 diagrams and even for many detail diagrams of a section of the board. From everson at evertype.com Fri Apr 7 17:41:35 2017 From: everson at evertype.com (Michael Everson) Date: Fri, 7 Apr 2017 23:41:35 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> On 7 Apr 2017, at 23:17, Christoph P?per wrote: >> The only connection this has with emoji is that it uses the variation selector system. > > As I've shown, that's not the *only* connection. Christoph, YOU ARE WRONG. Emoji as a special relationship with vendors and a particular implementation environment. Vendors via the UTC look at symbol and pictograph and other characters and decide if they want to give these symbols and pictographs and other characters the special characteristic which implies generally colour rendering and implies an obligation to supply input methods for those characters. That is expensive, and while evidently there are users who need to send BROCCOLI to one another, nobody but nobody needs to send an 8 x 8 chessboard matrix in a tweet. Get it? Emoji has nothing to do with the proposal to support standardized variation sequences for use with chess characters to provide support for their usage in chessboard diagrams. Please stop trying to conflate emoji and chess characters. It is NOT, I think, a solution which the UTC would agree to. I would oppose it in SC2. >> None of that is necessary, or relevant, to chess diagrams. > > Chess diagrams (unlike chess notation) are often rendered as graphics, not text. Because there is no robust text representation of chess diagrams. This proposal shows how very easy it is to support that behaviour, in parseable and interchangeable text, so that unparseable graphics don?t have to be used. > Board and glyphs may have fancy designs and colors, e.g. wooden fields. Two centuries of standard chess diagramming practice is all that?s needed to support. That?s text. That?s data. That?s what?s important. You want a pretty chess program, you can go download one. That?s not the same as this. >> I don't believe emoji are even necessarily fixed-width. > > In all existing implementations they are. That?s not true. > They are even always square. I'm not sure whether their em square always matches the sinographic ("ideographic?) square, but it seems as if it usually does. Not always, and that?s enough chaos. There is no standardization currently in chess fonts. One of them splits queens and rooks into two separate characters. This proposal solves that. > Without the need for ZWJ sequences, Opentype fonts can employ their Contextual Alternates `calt` feature to select the correct background color in diagram notation: In a sequence of up to eight chess pieces without an empty square with explicit color, an initial U+2656-FE0F White Rook, U+2654-FE0F White King, U+265B-FE0F Black Queen or U+265F-FE0F Black Pawn would default to a black background, U+2659-FE0F White Pawn, U+2655 White Queen, U+265A-FE0F Black King or U+265C-FE0F Black Rook to a white background. Other than that, each character uses the alternate glyph with opposing background color from its preceding (left-side) glyph. The empty squares work as explicit anchors. Well that?s a lot of effort to go to. And there?s no legible fallback if the ?calt? features can?t be invoked. This is a bad solution. Thank you for suggesting it. > If you want, I could write and post the code in Adobe OT feature file notation required for `calt` to demonstrate that this would yield results as expected for all full-size 8*8 diagrams and even for many detail diagrams of a section of the board. And when ?calt? substitutions can?t be displayed? What kind of fallback do you have? Michael Everson From everson at evertype.com Fri Apr 7 17:42:15 2017 From: everson at evertype.com (Michael Everson) Date: Fri, 7 Apr 2017 23:42:15 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <330428926.45676.1491472836694.JavaMail.open-xchange@app06.ox.hosteurope.de> <85BB6AAE-8319-446C-9C42-F6535CF043D4@evertype.com> Message-ID: <043455F1-169A-4C2F-AAA4-832022B1B32B@evertype.com> On 6 Apr 2017, at 18:43, Philippe Verdy wrote: >> It?s an argument for legibility. > > And an argument for rendering purpose only; Why? Shouldn?t human beings be able to read things that are rendered? > the actual 2D layout of chess diagrams is not part of Unicode The chesspiece characters are. And the chess community are not using those characters, but rather ASCII hacks, because there is no robust higher level protocol that can help them do what they need to do. > and does not have to be encoded. Why shouldn?t the chess community be able to use the Universal Character Set for their data? And what do you care? If you don?t care about chess or chess data, then you don?t have to use the chesspiece characters or the variation sequences which permit the additional functionality that is needed. > Unicode is not a glyph encoding standard. The Universal Character Set contains Variation Selectors designed to permit encoded characters to have specific glyph variants for certain purposes. This proposal makes use of them. > I still think this is a hack, The hack is the set of incompatible ASCII hacks used currently by the chess community. This proposal allows them to do the same work in an interchangeable way with UCS characters. > similar to ASCII art To the degree that lead-type chessboard typography is the equivalent of ?ASCII art?, that is a fair enough description. It?s ?UCS art?. And there is nothing wrong with that. > and legacy emojis made of ASCII punctuations like :-) or more complex pseudo-emojis using multiple rows (that do not render correctly when they depend on specific font designs and metrics.) So, you?re saying, you don?t know what you?re talking about. > I am still convinced that it does not matter if a legacy rendering will not show white vs. black cells because characte"rs are not rendered in a monospaced font. Monowidth fonts are not required for legible fallback. And according to this scheme, proportional fonts also give legible fallback. It may not quite as legible, if the chesspiece glyphs are proportional, but it isn?t all that bad. I have provided a number of sample images already. There are also some in the proposal. > The argument exposed for checkered boards here would not apply for many other boards that typically don't have checkered layouts (including for example for playing shogi or go). So what? Shogi and g? are different things and the graphic notation for them is different from the graphic notation for chess. This proposal is for chess. > If we want to add something to represent board cells/tiles in addition to pieces, that encoding should be coherent This proposal for standardized variation sequences is coherent. It might help if you were to read it. > and not choosing randomly some characters that were not even designed to align with similar metrics (such as ?? and ?? here!) and not really intended to represent (optionally colored) cells in a grid. I didn?t choose those characters randomly. I chose them with foresight and care. They are graphic symbols representing, in one case, a white square, and in the other, a square with a particular diagonal fill. Amazingly, these are graphic characters which can be used for ? the representation of graphic characters. And with Variation Selectors, chessboard-suitable glyphs can be evoked, and the characters can be used for the purposes outlined in the proposal. That?s what Variation Selectors are for. > As well this will not work with other layouts (including shogi that has variants where cells are triangular: There, there, Mr Verdy. It?s all right. It?s not supposed to work with shogi. It?s a proposal for chess. If it had been a proposal for shogi, perhaps it would, you know, mention shogi. > you cannot reliably represent them using rows filled with ????. These characters have implicit internal leading and trailing bearings Characters do not have leading and trailing bearings. Glyphs do. > both horizontally and vertically and cannot have metrics correctly set without breaking other notations that would depend on these bearings, for example in mathematic formulas where they are separated symbols). This proposal doesn?t have anything to do with shogi triangles. This proposal is about chess notation. This proposal solves a problem, that people setting Please read the proposal. > So you cannot expect rows in rectangular grid patterns made with ?? and ?? to look correct... unless they are each one modified with a variant selector saying they should use the full character cell Um, Mr Verdy? READ THE BLOODY PROPOSAL. LOOK AT THE PICTURES IN THE PROPOSAL. TRY TO UNDERSTAND THE PROPOSAL. Thanks. (I?ll help you out a little. You see, the proposal does modify ?? and ?? with Variant Selectors in order to ensure that rows in rectangular grid patterns made with ?? and ?? look correct. The examples in the proposal were made with fonds using the base characters and VS sequences proposed.) > (and there will still be problems with ?? because they will actually need to cover more than their rectangular cells with twho corners extending outside of it with additional kerning, not suitable for mathematics). ? and ? have nothing to do with this proposal, because those shapes are not used in Chess. Actually, I don?t think they have anything to do with shogi, either. None of the boards on the Wikipedia pages about shogi (in English, French, or Japanese) have any triangle-shaped board cells on them. What?s the French for ?red herring?? > And the poroblem with such grid patterns is more generic than just chess diagrams. All symbol systems have potential similarities to one another. > We should be able to represent directly at least several well known patterns of cells/tiles (optionally colored when this matters), and then be able to combine them with any chacter/cluster inside them (for example for classic crosswords, Scrabble, triominos and similar games). I don?t need to do that. I need a simple way to use the UCS to do what people have been doing with chess data for > We need a way to represent grids made with square/rectangular cells, or triangular/hexagonal cells (for triangular and hexagonal cells we need additional half-cells to properly align rows at least at start or rows, and hexagonal cells will partly extend over the previous and next row I don?t even know if all of that?s feasible in fonts. > So I would prefer a proposal to: > * add specific symbol characters for these common patterns of cells (rectangular/square, triangular, hexagonal), plus half-cells for use at start and end of rows (if rows are not aligned vertically but in create triangular layouts), You can write proposals for anything you want to. > * optionally followed by some variant selectors for mapping some semantic colors on them (semantic color means "light" and "dark" may be "white" and "black, or "ivory" and "wood", or "yellow" and "red", or "empty/transparent" vs. "hatched" with monochromatic rendering where colors are replaced by fill patterns such as ///, or dots with some density; we should have about 8 semantic colors, representable with actual colors or grey or fill patterns). The common "black square" and "white square" (the white version would be the default semantic color and would not need any additional variant). "The more you overtick the plumbing, the easier it is to stop up the drain.? ? Cmdr Montgomery Scott. > * and then use ZWJ to combine them with letters/symbols to be centered within them (possibly some extended clusters such as letters+combining subscript digits in Scrabble) Scrabble. My word. No. The present proposal meets a particular need: To enable the UCS to be able to set chess diagrams. Michael Everson From 637275 at gmail.com Fri Apr 7 18:28:12 2017 From: 637275 at gmail.com (Rebecca T) Date: Fri, 7 Apr 2017 19:28:12 -0400 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> Message-ID: > while evidently there are users who need to send BROCCOLI to one another, > nobody but nobody needs to send an 8 x 8 chessboard matrix in a tweet. Get > it? I simply must disagree; sending a textual chessboard sounds awesome! A twitter bot that plays chess with you and shows you a graphical representation of the board would be great! Don?t interpret this as an advocation of making chess pieces emoji, however; although that might be interesting, I?ll leave the actual decision making to those more experienced in that particular domain ? I?m simply saying that I think there are lots of rad potential applications for putting chessboards in tweets. Oh! We could host tournaments on twitter and merge the discussion into the actual tournament! That would be super cool! -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Fri Apr 7 18:33:14 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 8 Apr 2017 00:33:14 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> Message-ID: On 8 Apr 2017, at 00:28, Rebecca T <637275 at gmail.com> wrote: > > while evidently there are users who need to send BROCCOLI to one another, > > nobody but nobody needs to send an 8 x 8 chessboard matrix in a tweet. Get > > it? > > I simply must disagree; sending a textual chessboard sounds awesome! A twitter bot that plays chess with you and shows you a graphical representation of the board would be great! This isn?t about game play. Even if you get the UTC to bless chess pieces as emoji (why?) that would not affect this proposal, as other VS characters are used for emoji. Michael Everson From asmusf at ix.netcom.com Fri Apr 7 20:02:37 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 7 Apr 2017 18:02:37 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hos teurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> Message-ID: <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> On 4/7/2017 4:33 PM, Michael Everson wrote: > On 8 Apr 2017, at 00:28, Rebecca T <637275 at gmail.com> wrote: > >>> while evidently there are users who need to send BROCCOLI to one another, >>> nobody but nobody needs to send an 8 x 8 chessboard matrix in a tweet. Get >>> it? >> I simply must disagree; sending a textual chessboard sounds awesome! A twitter bot that plays chess with you and shows you a graphical representation of the board would be great! > This isn?t about game play. > Why rule this out? Once you have a plain text solution, you'll enable any plain text platform. Seems almost churlish to want to limit what you can do...in what would be after the fact. A./ From gwalla at gmail.com Fri Apr 7 22:53:43 2017 From: gwalla at gmail.com (Garth Wallace) Date: Fri, 7 Apr 2017 20:53:43 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: On Fri, Apr 7, 2017 at 3:17 PM, Christoph P?per wrote: > Garth Wallace : > > On Thu, Apr 6, 2017 at 5:19 AM, Christoph P?per < > christoph.paeper at crissov.de> > > wrote: > > > > > Although Michael Everson readily dismisses any connection to emojis, > (...) > > > normal emoji design actually matches "diagram" notation quite nicely > in > > > that all > > > emoji glyphs are rendered within an (ideographic / em) square. Black > and > > > white > > > squares are also already available as emojis (...) > > > > The only connection this has with emoji is that it uses the variation > selector > > system. > > As I've shown, that's not the *only* connection. > > > Emoji is a means of indicating full-color embedded images, > > That may be how they are often conceived. It's not quite true, though. The > first > sentence of UTS#51 reads: > > Emoji are pictographs (pictorial symbols) that are typically presented > in a > colorful cartoon form and used inline in text. > > That sounds similar, but is different in an important detail: the scope, > active > vs. passive. Emojis do not reference images, but systems present them as > images. > My point is that the features that distinguish emoji from other symbols in Unicode are not required, and in many cases not desired, for typesetting chess diagrams. It complicates things for no reason. > > ... and usually rely on rendering or interpreting systems that > substitute > > graphics, > > Such systems are widespread, but emojis don't rely on them since they are > true > characters. > Symbol fonts like Font Awesome, for instance, are usually PUA-only, > although > they should be using codepoints of existing emojis and other symbols as > much as > possible for fallback and interoperability reasons. They don't because > they'd > risk that some of their consistent, monochrome glyphs would be replaced > with a > colorful picture by an overly aggressive system. > And risking that some consistent monochrome glyphs would be replaced with colorful pictures by overly aggressive systems is also something that should be avoided with the chess symbols. > > or various non-standardized font extentions. > > Opentype 1.8 standardizes as many as four approaches to colorful glyphs. > Four currently non-interoperable approaches, AIUI. But this isn't really relevant here anyway. > > None of that is necessary, or relevant, to chess diagrams. > > Chess diagrams (unlike chess notation) are often rendered as graphics, not > text. > Board and glyphs may have fancy designs and colors, e.g. wooden fields. You > would get that for free with emoji presentation, but fonts would not have > to use > more than two colors. Note that "white" chess pieces should usually not be > rendered as outlines only, i.e. with transparent eyes, but with a solid > whitish > interior. Almost all other pre-emoji characters with "White" in their > standard > name are hollow. "Black" means always 'solid', but the actual color can be > any > (including white), although text is still mostly typeset in a blackish > color. > Chess diagrams *are* often rendered as graphics. When people want things like wood grain squares, they use graphics. Chess diagrams are *also* frequently typeset with fonts, using the same means as text. When doing so, the results are monochrome, with diagonal hatching for the dark squares. This is well established practice. Full color display of conventionally typeset diagrams would not be expected behavior, nor, in many cases such as publishing, would it be welcome behavior. It's the wrong tool for the job. Look, this proposal is not about "Wouldn't it be a neat idea if we could make chess diagrams in text?" People had that neat idea before they had the neat idea for Unicode, or for computers for that matter. This is about removing a barrier to people using Unicode instead of various mutually-incompatible dingbat fonts for something they already regularly do. > > I don't believe emoji are even necessarily fixed-width. > > In all existing implementations they are. They are even always square. I'm > not > sure whether their em square always matches the sinographic ("ideographic") > square, but it seems as if it usually does. > Doesn't matter. My point was that fixed width is not an inherent quality of emoji, it's just common. You can't rely on it, and it is not a commonality *in principle* with chess diagram typesetting, just a coincidence. > > > Without the need for ZWJ sequences, Opentype fonts can employ their > > > Contextual > > > Alternates `calt` feature to select the correct background color in > diagram > > > notation: In a sequence of up to eight chess pieces without an empty > square > > > with > > > explicit color, an initial U+2656-FE0F White Rook, U+2654-FE0F White > King, > > > U+265B-FE0F Black Queen or U+265F-FE0F Black Pawn would default to a > black > > > background, U+2659-FE0F White Pawn, U+2655 White Queen, U+265A-FE0F > Black > > > King > > > or U+265C-FE0F Black Rook to a white background. Other than that, each > > > character > > > uses the alternate glyph with opposing background color from its > preceding > > > (left-side) glyph. The empty squares work as explicit anchors. > > > > I don't see how this would work. Any row with a white rook on the first > file > > would start with a black background? What? > > Usually empty squares would determine the backgrounds of adjacent pieces, > but if > you have a line of adjacent chess pieces (up to 8 by standard rules) > without any > empty square (e.g. before the first move), there would need to be some > kind of > heuristic. I chose the colors of pieces that do not come in pairs (i.e. > King and > Queen) and of the left-most figurines (Rook and Pawn) for a board with > white at > the bottom, because that is the most common orientation. If you want, I > could > write and post the code in Adobe OT feature file notation required for > `calt` to > demonstrate that this would yield results as expected for all full-size 8*8 > diagrams and even for many detail diagrams of a section of the board. > I suppose a guess with a 50/50 chance of being wrong is still considered a heuristic, of sorts. I would like to see your proof of concept (I'm not interested in the code, I'd like to see the results you get) since I'm very skeptical that this would work reliably in practice. One nice thing about the existing VS proposal is that it does not require any heuristics at all. Each square is explicitly marked as light or dark, with no guessing needed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Sat Apr 8 06:10:18 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 8 Apr 2017 12:10:18 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hos teurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> Message-ID: On 8 Apr 2017, at 02:02, Asmus Freytag wrote: >> This isn?t about game play. > > Why rule this out? Once you have a plain text solution, you'll enable any plain text platform. > > Seems almost churlish to want to limit what you can do...in what would be after the fact. Developers can already use the encoded chess characters in game apps if they want. If we have a set of standardized variation sequences for chess notation, then if game developers want to use them, who is to complain? But that is not the point of this proposal, which is to enable people working with chess notation to be able to use the UCS (which they aren?t doing). An app interface has not the same plain-text requirement that people working with chess data do. (They ARE using fonts, which shows they want to do this in text. They are NOT using UCS characters, and they do NOT have a coherent model amongst any of their hacks.) Michael Everson From verdy_p at wanadoo.fr Sat Apr 8 07:01:32 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 8 Apr 2017 14:01:32 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> Message-ID: 2017-04-08 13:10 GMT+02:00 Michael Everson : > On 8 Apr 2017, at 02:02, Asmus Freytag wrote: > > >> This isn?t about game play. > > > > Why rule this out? Once you have a plain text solution, you'll enable > any plain text platform. > > > > Seems almost churlish to want to limit what you can do...in what would > be after the fact. > > Developers can already use the encoded chess characters in game apps if > they want. > > If we have a set of standardized variation sequences for chess notation, > then if game developers want to use them, who is to complain? But that is > not the point of this proposal, which is to enable people working with > chess notation to be able to use the UCS (which they aren?t doing). An app > interface has not the same plain-text requirement that people working with > chess data do. > > (They ARE using fonts, which shows they want to do this in text. They are > NOT using UCS characters, and they do NOT have a coherent model amongst any > of their hacks.) > May be they use fonts, but is OpenType the best tool for applications to create indexed collections of glyphs? SVG fonts are much easier to develop and change as they want. And SVG glyphs are easier to integrate in derived documents. For implementing a simple game, they don't need large collections. They can more easily integrate photographic features, or 3D features. OpenType implementations suffer from a huge resistance for newer features many features don't work if at the same time the Opentype renderer is not updated on the supporting platform (OS or web browser) OK there are some new SVG features as well, but they are much more tested than those in OpenType and much better documented, and don't suffer from various propritary extensions (such as font hinting which is definitely not "Open" and extremely poorly documented with many internal tricks made to restrict their use on specific OSes, plus stupid limitations/bugs in the way they were encoded, with no vision at all for their evolution or interaction with other features)... -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Sat Apr 8 07:10:15 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 8 Apr 2017 13:10:15 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> Message-ID: On 8 Apr 2017, at 13:01, Philippe Verdy wrote: > (They ARE using fonts, which shows they want to do this in text. They are NOT using UCS characters, and they do NOT have a coherent model amongst any of their hacks.) > > May be they use fonts, There is no maybe about it. > but is OpenType the best tool for applications to create indexed collections of glyphs? Standardized variation sequences for specific glyph presentation is a part of our standard. I have implemented this for the purposes described and it works. I implemented it with Williams font and it works. William implemented it in his font on his own and it works. What does this have to do with ?indexed collections of glyphs?? > SVG fonts are much easier to develop and change as they want. Red herring. > And SVG glyphs are easier to integrate in derived documents. Nonsense. > For implementing a simple game, they don't need large collections. They can more easily integrate photographic features, or 3D features. OpenType implementations suffer from a huge resistance for newer features many features don't work if at the same time the Opentype renderer is not updated on the supporting platform (OS or web browser) We?re not proposing to ?implement a game?. > OK there are some new SVG features as well, but they are much more tested than those in OpenType and much better documented, and don't suffer from various propritary extensions (such as font hinting which is definitely not "Open" and extremely poorly documented with many internal tricks made to restrict their use on specific OSes, plus stupid limitations/bugs in the way they were encoded, with no vision at all for their evolution or interaction with other features)... This has nothing to do with our proposal, or with the current practice of the chess commmunity. Michael Everson From richard.wordingham at ntlworld.com Sat Apr 8 08:49:45 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 8 Apr 2017 14:49:45 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <20170403220348.3efb4d1a@JRWUBU2> <421BC3D3-DF71-4D76-93E7-CAFDEDFBFFCB@evertype.com> <20170404004701.19ad750c@JRWUBU2> <20170404185431.07dbe483@JRWUBU2> <07DD2CE0-5510-49A3-883A-EF7A1A34C80E@evertype.com> <20170405045056.309b67b4@JRWUBU2> Message-ID: <20170408144945.65a457e2@JRWUBU2> On Wed, 5 Apr 2017 14:08:03 +0100 Michael Everson wrote: > On 5 Apr 2017, at 04:50, Richard Wordingham > wrote: > > >> Why would anyone make a font that supports the variants for > >> drawing chessboards (which require the encoded characters > >> 2654..265F) not put in glyphs for those? > > > > A stop-gap font based on poor glyphs comes to mind. > > For pity?s sake. Yes, people can make and distribute crappy fonts. > What are you on about? This is not serious criticism and what you > suggest is no realistic scenario. > > > Is this a sequence for the GSUB table or for the cmap table? > > It?s the OpenType table. I said this already, twice. The entries are: > > sub uni2654 uniFE00 by uni2654FE00 ; > sub uni2654 uniFE01 by uni2654FE01 ; > sub uni2655 uniFE00 by uni2655FE00 ; > sub uni2655 uniFE01 by uni2655FE01 ; > > and so on. > > > The font I have in mind would have no entry for U+2654 in its cmap > > format 4 subtable but would, following the proposal you put up on > > Saturday, have entries for and in > > its cmap format 14 subtable. > > OK, to hell with the cmap format 14 table. I don?t know what this is. > I didn?t edit any such a table. I have text in my proposal about that > because I took it from a proposal for Variation Sequences by Ken > Lunde. I thought that was the same as the mechanism which I used to > create the tables in my font, which worked, and worked well. I have > implemented it. I would use my fonts (with variation selectors) in > print and I would distribute the fonts. > > PLEASE LOOK. You have said that you would add and > and that?s JUST what I have in my opentype table. So > you and I are doing the same thing. It may look like the same thing, but it is potentially quite different. I have created a TrueType, not OpenType, font (except that it has an OS/2 table) 'Bad Chess' to demonstrate (http://wrdingham.co.uk/lanna/bad_chess.htm) that one does not need a GSUB table or similar to define a glyph mapping for variation sequences. Examine the font itself if you find this hard to believe. There is no character to glyph map for variation selectors. Unfortunately, I took so long to start creating such fonts that I didn't have time to create glyphs. Instead, I reused some free (as in speech) glyphs to demonstrate the effects. My original hope was that it would map the variation sequences and to glyphs without having a map for U+2654. However, if there is no character to glyph map for U+2654, both Firefox and MS Edge choose to use other fonts for the sequences, even though they will then naturally fall back to a map for U+2654, ignoring the variation selector. When I added a map to a glyph for U+2654, then the sequences worked, but then the font is honour-bound to have a reasonable glyph for use in 'notation' text, not just for chessboards. (The 'Bad Chess' font is currently a dishonourable proof of principle font.) Note that when Format 14 cmap subtables are understood, it does not matter whether font ligatures are heeded by the renderer. This contrasts with the suggestion that square + occupying piece be encoded as a ligature of square and occupying piece, which depends for legibility on the ligation actually happening. > > This approach is entirely consistent with the conception of > > variation sequences as pseudo-encoding. > > No, it?s a way of showing a particular glyph for an underlying base > character. I was referring to the mechanics of converting a code to a glyph, not the issue of whether, if at all, a glyph family should be encoded. > > I could make such a font (for one chesspiece), but it would take at > > least an evening. > > I?ve done it for three fonts already, Ludus, Condal, and William?s > Quest font. Did you create the glyphs for the chess pieces de novo, or merely adapt them to show occupied squares? Licences are an issue if one wishes to share fonts. Richard. From verdy_p at wanadoo.fr Sat Apr 8 08:50:33 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 8 Apr 2017 15:50:33 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> Message-ID: 2017-04-08 14:10 GMT+02:00 Michael Everson : > On 8 Apr 2017, at 13:01, Philippe Verdy wrote: > > > (They ARE using fonts, which shows they want to do this in text. They > are NOT using UCS characters, and they do NOT have a coherent model amongst > any of their hacks.) > > > > May be they use fonts, > > There is no maybe about it. > There REALLY IS a "maybe", because this is not required at all, and most chess applications do not use any "font" (most of them display bitmap icons, or custom 2D/3D graphics) > > > but is OpenType the best tool for applications to create indexed > collections of glyphs? > > Standardized variation sequences for specific glyph presentation is a part > of our standard. I have implemented this for the purposes described and it > works. I implemented it with Williams font and it works. William > implemented it in his font on his own and it works. > > What does this have to do with ?indexed collections of glyphs?? > > > SVG fonts are much easier to develop and change as they want. > > Red herring. > What ??? Black herring here ! > > > And SVG glyphs are easier to integrate in derived documents. > > Nonsense. > Non sense reply !!! Custom fonts are hard to integrate as they depend on renderers (which most applications don't want to support directly, they are part of a browser or OS). And OpenType fonts are much less flexible for what applications want to do. SVG allows much easier variations and effects. There are tons of tools or stylesheets for that, which will not work on glyphs in OpenType fonts. > > > For implementing a simple game, they don't need large collections. They > can more easily integrate photographic features, or 3D features. OpenType > implementations suffer from a huge resistance for newer features many > features don't work if at the same time the Opentype renderer is not > updated on the supporting platform (OS or web browser) > > We?re not proposing to ?implement a game?. You were yourself speaking about applications, me too, not just a "game". -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Sat Apr 8 08:59:17 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 8 Apr 2017 14:59:17 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> Message-ID: <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> On 8 Apr 2017, at 14:50, Philippe Verdy wrote: >>> May be they use fonts, >> >> There is no maybe about it. > > There REALLY IS a "maybe", because this is not required at all, and most chess applications do not use any "font" (most of them display bitmap icons, or custom 2D/3D graphics) The proposal is not based on the practice of chess game applications. The proposal deals with the problem of typesetting chess diagrams. This is a publishing function. >>> And SVG glyphs are easier to integrate in derived documents. >> >> Nonsense. > > Non sense reply !!! Custom fonts What? Fonts. You know. Fonts. Truetype with OpenType tables for glyph substition. This is nothing special. This is bog-standard. > are hard to integrate as they depend on renderers (which most applications don't want to support directly, they are part of a browser or OS). And OpenType fonts are much less flexible for what applications want to do. SVG allows much easier variations and effects. There are tons of tools or stylesheets for that, which will not work on glyphs in OpenType fonts. This has nothing to do with the proposal. >> We?re not proposing to ?implement a game?. > > You were yourself speaking about applications, me too, not just a "game". No, I wasn?t. Michael Everson From verdy_p at wanadoo.fr Sat Apr 8 09:14:32 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Sat, 8 Apr 2017 16:14:32 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> Message-ID: 2017-04-08 15:59 GMT+02:00 Michael Everson : > >> We?re not proposing to ?implement a game?. > > > > You were yourself speaking about applications, me too, not just a "game". > > No, I wasn?t. I can quote your own message just posted 3 hours ago? YOU REALLY USED the term "game" and wanted developers to use fonts for them. This is definitely not what most chess game developers do and have done since long, becaues fonts are definitely not easily integrable and give unpredictable results. They would not accept the kind of fallbacks you document for encoding in plain text. QUOTING YOUR OWN MESSAGE BELOW. 2017-04-08 13:10 GMT+02:00 Michael Everson : > Developers can already use the encoded chess characters in game apps if > they want. > > If we have a set of standardized variation sequences for chess notation, > then if game developers want to use them, who is to complain? But that is > not the point of this proposal, which is to enable people working with > chess notation to be able to use the UCS (which they aren?t doing). An app > interface has not the same plain-text requirement that people working with > chess data do. > > (They ARE using fonts, which shows they want to do this in text. They are > NOT using UCS characters, and they do NOT have a coherent model amongst any > of their hacks.) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Sat Apr 8 05:03:31 2017 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 8 Apr 2017 11:03:31 +0100 (BST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> Message-ID: <10671671.8753.1491645811693.JavaMail.defaultUser@defaultHost> I have made an OpenType font that implements Michael's proposed format and the extension of having variation selectors for the border units that Michael kindly added during the discussion. I have published the font and the font is available, free, from the following forum thread. http://forum.high-logic.com/viewtopic.php?f=10&t=7033 Registration to the forum is possible, but is not necessary in order to download the font. The font is Quest text 2017 which is a font derived from the Quest text font that was discussed in this thread a few days ago, so I had many of the necessary glyphs already available to use. I enjoyed making the font. Michael's format works well in the Serif PagePlus X7 desktop publishing program where I tested it. William Overington Saturday 8 April 2017 From richard.wordingham at ntlworld.com Sat Apr 8 10:03:15 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 8 Apr 2017 16:03:15 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <5F9BCF6C-D351-4DFF-A972-2B251B4282CF@evertype.com> References: <4919039.28328.1491394049217.JavaMail.root@webmail43.bt.ext.cpcloud.co.uk> <30044668.29483.1491394966847.JavaMail.defaultUser@defaultHost> <10737369.47241.1491409684891.JavaMail.defaultUser@defaultHost> <5F9BCF6C-D351-4DFF-A972-2B251B4282CF@evertype.com> Message-ID: <20170408160315.1305a8d5@JRWUBU2> On Wed, 5 Apr 2017 20:32:44 +0100 Michael Everson wrote: > On 5 Apr 2017, at 20:13, Philippe Verdy wrote: > Chess characters aren?t emojis. That doesn't mean that solutions applicable to emojis might not be applicable elsewhere. > The logic of the use of VS in this proposal is no different from the > logic used with them in maths, or in Myanmar, or even in some emoji. It is. As far as I am aware, it is not completely wrong to use a 'dotted' Khamti-syle letter for Burmese. Assuming you are not just looking for a technically simple solution, but also an honest one, I can only think you are treating a depiction using 'black squares' for empty 'black' squares, 'white squares' for empty 'white' squares, and chess pieces for occupied squares as the starting point. However, you are proposing to adjoin notation the squares the pieces are sitting on. Someone, intending to be believed, claimed that a white chess square was not a 'nirvanic nothing', despite also claiming that it did not naturally have colour, but was transparent in display terms. (The chessboard nearest me has yellow white squares.) This is stretching a principle, rather like the piecemeal addition of subscript and superscript letters. > If there are other uses which can be made of chess pieces, then those > uses can be investigated in due course by someone interested in that. > > > and various board types may be used (not only with square cells, > > for example there are rectangular ones or triangular for Shogi > > pieces in Japan, > > Shogi is not chess. Shogi notation is not like chess notation, > either. Try to focus on the actual proposal. It is better to know where the encoding additions are likely to take us. Are we heading for a dartboard notation? Richard. From richard.wordingham at ntlworld.com Sat Apr 8 10:41:29 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 8 Apr 2017 16:41:29 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <2098249861.45062.1491469723263.JavaMail.open-xchange@app06.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <2098249861.45062.1491469723263.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <20170408164129.600a11a8@JRWUBU2> On Thu, 6 Apr 2017 11:08:43 +0200 (CEST) Christoph P?per wrote: > Richard Wordingham : > > If the variation selectors are ignored, these simplify to: > > > > white square > > hatched square > > specific piece > > > > This preserves all the information; the pattern of squares is known > > in advance and therefore redundant. > > I argued before that > > empty square > specific piece > > would already be enough to carry all the required semantics, at least > for drawing complete boards, because the coloring pattern is simple, > well-known and redundant. I meant redundant in that the colour of a single empty square determines the colour of the rest. Is there absolutely no interest in depicting games played with the board the wrong way round? Richard. From richard.wordingham at ntlworld.com Sat Apr 8 10:56:18 2017 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 8 Apr 2017 16:56:18 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <9C3A99D8-D873-41E6-8014-D163C4EF2597@evertype.com> Message-ID: <20170408165618.445799ab@JRWUBU2> On Thu, 06 Apr 2017 18:26:39 +0200 Kent Karlsson wrote: > All the characters in the "chess board lines" (apart from spaces, if > any), are of bidi category ON or NSM. So there is no character that > "sets" a bidi direction of the lines ("paragraphs"). So if the bidi > setting for display is set to default to RTL, each of the chess board > lines will be reversed in display. Now, since the border characters > are not mirrored, the left and right side of the board side lines > will be somewhat botched. Which is very visible in that it is ugly. > (And I guess(!) the reader will notice that...) I certainly did when I flipped the 'higher level protocol' on one of the chessboards posted earlier. I thought I'd already explained that all one needs to do is to bracket the board with U+200E LEFT-TO-RIGHT MARK before and after; the character is designed for situations like this. Richard. From everson at evertype.com Sat Apr 8 14:20:26 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 8 Apr 2017 20:20:26 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> Message-ID: > On 8 Apr 2017, at 15:14, Philippe Verdy wrote: > > 2017-04-08 15:59 GMT+02:00 Michael Everson : > >> We?re not proposing to ?implement a game?. > > > > You were yourself speaking about applications, me too, not just a "game". > > No, I wasn?t. > > I can quote your own message just posted 3 hours ago? YOU REALLY USED the term "game" and wanted developers to use fonts for them. Please learn to read. > This is definitely not what most chess game developers do and have done since long, becaues fonts are definitely not easily integrable and give unpredictable results. They would not accept the kind of fallbacks you document for encoding in plain text. > > QUOTING YOUR OWN MESSAGE BELOW. > > 2017-04-08 13:10 GMT+02:00 Michael Everson : > Developers can already use the encoded chess characters in game apps if they want. > > If we have a set of standardized variation sequences for chess notation, then if game developers want to use them, who is to complain? This means, I would not complain, because that isn?t the point of the proposal, and if they use text and fonts or if they use graphics is of no consequence. They can do whatever they need for their purposess. > But that is not the point of this proposal, See? Gaming apps is not the point of the proposal. > which is to enable people working with chess notation to be able to use the UCS (which they aren?t doing). An app interface has not the same plain-text requirement that people working with chess data do. > > (They ARE using fonts, which shows they want to do this in text. They are NOT using UCS characters, and they do NOT have a coherent model amongst any of their hacks.) > From asmusf at ix.netcom.com Sat Apr 8 16:23:58 2017 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 8 Apr 2017 14:23:58 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> Message-ID: <04fa59f3-e249-ab6d-b08e-64bf58b3ff07@ix.netcom.com> An HTML attachment was scrubbed... URL: From everson at evertype.com Sat Apr 8 17:28:58 2017 From: everson at evertype.com (Michael Everson) Date: Sat, 8 Apr 2017 23:28:58 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <04fa59f3-e249-ab6d-b08e-64bf58b3ff07@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> <04fa59f3-e249-ab6d-b08e-64bf58b3ff07@ix.netcom.com> Message-ID: <41D94AA7-CCB1-4537-ACA6-76E892EBF945@evertype.com> On 8 Apr 2017, at 22:23, Asmus Freytag wrote: > Time for Sarasvati to pull the plug on this thread? Useful input has been gratefully received. I thank those gave it. Michael Everson From kent.karlsson14 at telia.com Sun Apr 9 12:02:21 2017 From: kent.karlsson14 at telia.com (Kent Karlsson) Date: Sun, 09 Apr 2017 19:02:21 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <6490CD43-45AF-40C0-9AB4-A2F8937DFF4E@evertype.com> Message-ID: Den 2017-04-06 01:25, skrev "Michael Everson" : > Oh, here. This is what I would add. > > 2581 FE00; Chessboard box drawing; # LOWER ONE EIGHTH BLOCK > 258F FE00; Chessboard box drawing; # LEFT ONE EIGHTH BLOCK > 2594 FE00; Chessboard box drawing; # UPPER ONE EIGHTH BLOCK > 2595 FE00; Chessboard box drawing; # RIGHT ONE EIGHTH BLOCK > 2596 FE00; Chessboard box drawing; # QUADRANT LOWER LEFT > 2597 FE00; Chessboard box drawing; # QUADRANT LOWER RIGHT > 2598 FE00; Chessboard box drawing; # QUADRANT UPPER LEFT > 259D FE00; Chessboard box drawing; # QUADRANT UPPER RIGHT Instead of that, I'd suggest: 2500 FE00; Chessboard box drawing (top); # BOX DRAWINGS LIGHT HORIZONTAL (U+2500) 2500 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS LIGHT HORIZONTAL (U+2500) 2502 FE00; Chessboard box drawing (left); # BOX DRAWINGS LIGHT VERTICAL (U+2502) 2502 FE01; Chessboard box drawing (right); # BOX DRAWINGS LIGHT VERTICAL (U+2502) 250C FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND RIGHT (U+250C) 2510 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND LEFT (U+2510) 2514 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND RIGHT (U+2514) 2518 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND LEFT (U+2518) These are more likely to be supported (by (fixed-width) fonts) in fallback than the ones you suggest. They are also intended for box drawing (unlike the ones you suggest). Perhaps also, since you exemplify also with double borders in your document: 2550 FE00; Chessboard box drawing (top); # BOX DRAWINGS DOUBLE HORIZONTAL (U+2550) 2550 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS DOUBLE HORIZONTAL (U+2550) 2551 FE00; Chessboard box drawing (left); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) 2551 FE01; Chessboard box drawing (right); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) 2554 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND RIGHT (U+2554) 2557 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND LEFT (U+2557) 255A FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND RIGHT (U+255A) 255D FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND LEFT (U+255D) /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Mon Apr 10 03:49:33 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 10 Apr 2017 10:49:33 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> Message-ID: <21251823.54972.1491814173063.JavaMail.open-xchange@app07.ox.hosteurope.de> Garth Wallace : > > On Fri, Apr 7, 2017 at 3:17 PM, Christoph P?per > wrote: > > > > That sounds similar, but is different in an important detail: the scope, > > active > > vs. passive. Emojis do not reference images, but systems present them as > > images. > > My point is that the features that distinguish emoji from other symbols in > Unicode are not required, > and in many cases not desired, for typesetting chess diagrams. It complicates > things for no reason. If Unicode chess diagrams used VS-16 instead of VS-1 and VS-2, users could one day choose a font that fakes marble, wood, glass, steel or just some random color or even animation for pieces and board squares. Since this kind of customization is a common feature in chess applications, I'd expect it to be a welcome feature for textual diagrams as well, even if it's not used (much) in print books. Within the web ecosystem that relies upon CSS for styling, authors and readers could very well and easily use different designs. With non-emoji chess characters, the differences would mostly be limited to glyph outlines. Is that "no reason"? > And risking that some consistent monochrome glyphs would be replaced with > colorful pictures > by overly aggressive systems is also something that should be avoided with the > chess symbols. VS-15 should be better at that than any other variation selector. I know that recent Samsung Android devices provide emoji glyphs for chess pieces, but have no such device available to test whether and how they react differently on no VS, VS-1..14, VS-15 and VS-16, with and without a supporting font provided. > Chess diagrams *are* often rendered as graphics. When people want things like > wood grain squares, they use graphics. See above: with emoji chess pieces users would have the choice to display textual board diagrams in their preferred style. In some cases, people and users are authors or editors, but readers elsewhere. > Chess diagrams are *also* frequently typeset with fonts, using the same means > as text. > When doing so, the results are monochrome, with diagonal hatching for the dark > squares. > This is well established practice. I do not contest any of that. > Full color display of conventionally typeset diagrams would not be expected > behavior, > nor, in many cases such as publishing, would it be welcome behavior. It's the > wrong tool for the job. Again: emoji glyphs are not required to use more than a single color, cf. I-Mode, Android 4.3, Windows 7, Symbola, Emojione B&W, Adobe Source Emoji. > Look, this proposal is not about "Wouldn't it be a neat idea if we could make > chess diagrams in text?" People had that neat idea before they had the neat > idea for Unicode, or for computers for that matter. This is about removing a > barrier to people using Unicode instead of various mutually-incompatible > dingbat fonts for something they already regularly do. I understand that perfectly well. Currently, Unicode chess pieces are only well-suited for figurine notation, not for 2D diagrams. I even agree with the approach to use variation selectors. I just think that there would be significant positive synergy from reusing the infrastructure already established for emojis. > Doesn't matter. My point was that fixed width is not an inherent quality of > emoji, it's just common. That's what I'm telling you about colorfulness. I provided several counter examples for the latter above, but know none for the former. UTS#51 has this to say on the matter: Current practice is for emoji to have a square aspect ratio, deriving from their origin in Japanese. For interoperability, it is recommended that this practice be continued with current and future emoji. > > Usually empty squares would determine the backgrounds of adjacent pieces, > > but if > > you have a line of adjacent chess pieces (up to 8 by standard rules) > > without any > > empty square (e.g. before the first move), there would need to be some kind > > of > > heuristic. I chose the colors of pieces that do not come in pairs (i.e. > > King and > > Queen) and of the left-most figurines (Rook and Pawn) for a board with > > white at > > the bottom, because that is the most common orientation. (...) > > this would yield results as expected for all full-size 8*8 > > diagrams and even for many detail diagrams of a section of the board. > > I suppose a guess with a 50/50 chance of being wrong is still considered a > heuristic, of sorts. > I would like to see your proof of concept (...) since I'm very skeptical that > this would work reliably in practice. It works with 100% accuracy for every row that contains at least one empty square (with explicit color). For pieces of all the same color, this basically only happens for the two rows at the top and bottom in the initial position. These are taken care of by examining the first character in a series of eight: rook or pawn, black or white. I've added king and queen only for robustness in diagrams that do not show the full 8 by 8 standard board. Even where it could fail, the diagram would probably be just as valid for the alternate alternating pattern. > One nice thing about the existing VS proposal is that it does not require any > heuristics at all. > Each square is explicitly marked as light or dark, with no guessing needed. The UI/UX drawback is that authors have to explicitly mark every field -- unless you put the heuristics there. From jsbien at mimuw.edu.pl Mon Apr 10 04:54:20 2017 From: jsbien at mimuw.edu.pl (Janusz S. =?utf-8?Q?Bie=C5=84?=) Date: Mon, 10 Apr 2017 11:54:20 +0200 Subject: Unicode vs. Unikod Message-ID: <86shlgk2pf.fsf@mimuw.edu.pl> This is a long overdue issue, but better late than never. To make a long story short, I think that the word "Unikod" should not be used in the Polish translation of "What is Unicode": http://www.unicode.org/standard/translations/polish.html The word "Unikod", to the best of my knowledge, has been coined long ago by Piotr Trzcionkowski, who registered also the domain www.unikod.pl used to advocate Unicode in Poland. "Unicode" as a trademark should not be translated, for me this is quite obvious. This is actually the case in almost all language versions of "What is Unicode" using the Latin script (with the exception of Esperanto and Lithuanian, where it can be probably justified by some grammar rules, and Polish). This is also seems to be the case in various language versions of Wikipedia (I've checked only some of them) with the exception of the Polish one which uses "Unikod" as the primary entry. The occurence of "Unikod" on the Unicode site may be interpreted as an official acceptance of this equivalent. I hope this is not the case. I would like to clarify the matter before engaging myself in a discussion about introducing the "Unicode" primary entry in Polish Wikipedia. You can check the usage of "Unicode" and "Unikod" in Polish not only with Google but also in the National Corpus of Polish: http://nkjp.pl/ There are 786 occurences of "Unicode" coming mainly from published books and 102 occurences of "Unikod", mainly in Usenet postings and Wikipedia discussions. Grammatical Dictionary of Polish contains only "unicode": http://sgjp.pl/leksemy/#73537/unicode Polish versions of Windows use "Unicode". A localization dictionary http://www.btinfodictionary.com/ also preserves "Unicode". Actually I don't mind using "Unikod" (or better "unikod") informally, as e.g. spelling "Unikodem" is simpler than "Unicode'em" (instrumental singular) and "Unikodzie" looks better in Polish than "Unicodzie" (locative singular), moreover there is no doubt how to pronounce it. This is probably the reason why, to my surprise, the word was introduced also in some other Slavonic languages, e.g. https://en.wiktionary.org/wiki/Unikod. My point is only that both "What is Unicode" and Polish Wikipedia primary entry should use the original spelling. Best regards Janusz -- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/ From everson at evertype.com Mon Apr 10 05:19:24 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Apr 2017 11:19:24 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Kent, I believe the box drawing characters are for drawing boxes and grids on computer terminals, which is not the same thing as scoring a line around a set of 64 graphic images. I don?t want to get mixed up in using the box-drawing characters. The characters which I have chosen work fine and to my mind suit the application better. I also don?t want to complicate chess fonts by having to have multiple choices within a font for bordering. For one thing, single-rule and double-rule bordering is by no means the gamut of possibility. Chess fonts do not have to be swiss-army knives. Thank you for your consideration, but I will stick with using the ?-block and quadrant characters. Michael Everson > On 9 Apr 2017, at 18:02, Kent Karlsson wrote: > > > Den 2017-04-06 01:25, skrev "Michael Everson" : > > > Oh, here. This is what I would add. > > > > 2581 FE00; Chessboard box drawing; # LOWER ONE EIGHTH BLOCK > > 258F FE00; Chessboard box drawing; # LEFT ONE EIGHTH BLOCK > > 2594 FE00; Chessboard box drawing; # UPPER ONE EIGHTH BLOCK > > 2595 FE00; Chessboard box drawing; # RIGHT ONE EIGHTH BLOCK > > 2596 FE00; Chessboard box drawing; # QUADRANT LOWER LEFT > > 2597 FE00; Chessboard box drawing; # QUADRANT LOWER RIGHT > > 2598 FE00; Chessboard box drawing; # QUADRANT UPPER LEFT > > 259D FE00; Chessboard box drawing; # QUADRANT UPPER RIGHT > > Instead of that, I'd suggest: > 2500 FE00; Chessboard box drawing (top); # BOX DRAWINGS LIGHT HORIZONTAL (U+2500) > 2500 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS LIGHT HORIZONTAL (U+2500) > 2502 FE00; Chessboard box drawing (left); # BOX DRAWINGS LIGHT VERTICAL (U+2502) > 2502 FE01; Chessboard box drawing (right); # BOX DRAWINGS LIGHT VERTICAL (U+2502) > 250C FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND RIGHT (U+250C) > 2510 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND LEFT (U+2510) > 2514 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND RIGHT (U+2514) > 2518 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND LEFT (U+2518) > > These are more likely to be supported (by (fixed-width) fonts) in fallback than the ones you suggest. > They are also intended for box drawing (unlike the ones you suggest). > > Perhaps also, since you exemplify also with double borders in your document: > 2550 FE00; Chessboard box drawing (top); # BOX DRAWINGS DOUBLE HORIZONTAL (U+2550) > 2550 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS DOUBLE HORIZONTAL (U+2550) > 2551 FE00; Chessboard box drawing (left); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) > 2551 FE01; Chessboard box drawing (right); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) > 2554 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND RIGHT (U+2554) > 2557 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND LEFT (U+2557) > 255A FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND RIGHT (U+255A) > 255D FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND LEFT (U+255D) > > /Kent K From everson at evertype.com Mon Apr 10 05:31:02 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Apr 2017 11:31:02 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <21251823.54972.1491814173063.JavaMail.open-xchange@app07.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <21251823.54972.1491814173063.JavaMail.open-xchange@app07.ox.hosteurope.de> Message-ID: On 10 Apr 2017, at 09:49, Christoph P?per wrote: > If Unicode chess diagrams used VS-16 instead of VS-1 and VS-2, users could one day choose a font that fakes marble, wood, glass, steel or just some random color or even animation for pieces and board squares. Since this kind of customization is a common feature in chess applications, I'd expect it to be a welcome feature for textual diagrams as well, even if it's not used (much) in print books. Within the web ecosystem that relies upon CSS for styling, authors and readers could very well and easily use different designs. With non-emoji chess characters, the differences would mostly be limited to glyph outlines. Well, no. > Is that "no reason?? Yes. If the UTC wants to make chess characters into emoji then they can do that. Garth and I are not asking for it. We're asking for interchangeability and stability in representing chess diagram data. This is not the same thing as what you are talking about, and so it is not relevant to the proposal. >> And risking that some consistent monochrome glyphs would be replaced with colorful pictures by overly aggressive systems is also something that should be avoided with the chess symbols. > > VS-15 should be better at that than any other variation selector. That just tells the glyph to be the text glyph from the code charts. That ignores completely the piece-on-square glyphs the proposal requires. You?re talking about something irrelevant to the proposal. Christoph. It?s not helpful. As Garth says: >> Look, this proposal is not about "Wouldn't it be a neat idea if we could make chess diagrams in text?" People had that neat idea before they had the neat idea for Unicode, or for computers for that matter. This is about removing a barrier to people using Unicode instead of various mutually-incompatible dingbat fonts for something they already regularly do. > > I understand that perfectly well. Currently, Unicode chess pieces are only well-suited for figurine notation, not for 2D diagrams. I even agree with the approach to use variation selectors. Thank you. > I just think that there would be significant positive synergy from reusing the infrastructure already established for emojis. I think this is a huge distraction from the simple and robust proposal made. Emoji is a different kind of thing. >> One nice thing about the existing VS proposal is that it does not require any heuristics at all. Each square is explicitly marked as light or dark, with no guessing needed. > > The UI/UX drawback is that authors have to explicitly mark every field ? unless you put the heuristics there. Yes. Authors should have to explicitly mark every field. That gives a consistent number of encoded characters for each square, which helps to facilitate fallback reading when the ligation is not available. OK, nothing new has been offered on this topic for a long time. Thank you for your support of the VS proposal, Christoph. Your supplementary proposals didn?t make it better to achieve the goal: to remove the barrier to people using Unicode instead of various mutually-incompatible dingbat fonts for something they already regularly do. Michael Everson From christoph.paeper at crissov.de Mon Apr 10 05:40:25 2017 From: christoph.paeper at crissov.de (=?UTF-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 10 Apr 2017 12:40:25 +0200 (CEST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> Message-ID: <1214743651.55766.1491820825323.JavaMail.open-xchange@app07.ox.hosteurope.de> Michael Everson : > On 7 Apr 2017, at 23:17, Christoph P?per wrote: > > >> The only connection this has with emoji is that it uses the variation > >> selector system. > > > > As I've shown, that's not the *only* connection. > > Christoph, YOU ARE WRONG. Even if I were, nobody has proven that. Everybody is just shouting out their presumptions and prejudices, full of falsehoods. > Emoji as a special relationship with vendors and a particular implementation > environment. That is true. It does not mean that a) this environment would not be used to interchange chess diagrams nor b) parties interested in rendering textual chess diagrams couldn't take advantage of it and bend it to their requirements. > Vendors via the UTC look at symbol and pictograph and other characters and > decide if they want to give these symbols and pictographs and other characters > the special characteristic which implies generally colour rendering and > implies an obligation to supply input methods for those characters. Yes, Unicode's emojification process is still seriously broken. It's not an argument against reusing the underlying techniques, though. > nobody needs to send an 8 x 8 chessboard matrix in a tweet. Get it? Now you are kidding, right? With 68 chars remaining for comments, stuff like #chesspuzzle would be much improved if replies could include boards as well without resorting to screenshots (which requires rebuilding the diagram in your favorite chess app) or image editors, just copy and paste of text characters. Most serious players would continue to rely on more or less proprietary exchange formats within their preferred app, though. > Please stop trying to conflate emoji and chess characters. It is NOT, I think, > a solution which the UTC would agree to. I would oppose it in SC2. That's why I'm trying to convince you (but not just you) in this early stage. > Two centuries of standard chess diagramming practice is all that?s needed to > support. In a B2C-only publishing world perhaps. It's not what we're living in any more, fortunately. > >> I don't believe emoji are even necessarily fixed-width. > > > > In all existing implementations they are. > > That?s not true. Could you please provide a counter example? The original Japanese sets did feature three space characters for full, half and another partial (1/3?) ideographic width, but none of the visible glyphs were less than fullwidth. > > They are even always square. > > Not always, and that?s enough chaos. There is no standardization currently in > chess fonts. *Emojis* are always square. I didn't say anything about fonts used for chess diagrams here. > > Without the need for ZWJ sequences, > > Opentype fonts can employ their Contextual Alternates `calt` feature > > to select the correct background color in diagram notation: (...) > > (...) there?s no legible fallback if the ?calt? features can?t be invoked. In a standard 8 by 8 squares chess diagram using U+2654..F and U+2B1B/C, you would still have *at least* 32 squares (i.e. 50%) with explicit color shown. Emoji characters line up on the ideographic grid. Even if figurine glyphs had no distinctive background color, it would still be legible. From everson at evertype.com Mon Apr 10 05:58:57 2017 From: everson at evertype.com (Michael Everson) Date: Mon, 10 Apr 2017 11:58:57 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <1214743651.55766.1491820825323.JavaMail.open-xchange@app07.ox.hosteurope.de> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <1214743651.55766.1491820825323.JavaMail.open-xchange@app07.ox.hosteurope.de> Message-ID: <8397C574-83FF-4C10-904E-30044712E945@evertype.com> On 10 Apr 2017, at 11:40, Christoph P?per wrote: > Even if I were [wrong], nobody has proven that. Everybody is just shouting out their presumptions and prejudices, full of falsehoods. I have stated that emoji is a different world. It brings with it specific implications for burdening vendors in a particular way. I am not having this simple, feasible, sensible, and effective proposal derailed by mixing it in with colourful emoji fonts. I have stated nothing false. >> Emoji as a special relationship with vendors and a particular implementation environment. > > That is true. It does not mean that > > a) this environment would not be used to interchange chess diagrams nor Emoji is for sending stuff to your friends via various messaging services. Chess diagrams have been set in plain type for going on two hundred years. That?s what the proposal supports. That?s all it supports. It solves the problem of using the UCS to set such diagrams. That?s it. > b) parties interested in rendering textual chess diagrams couldn't take advantage of it and bend it to their requirements. I?ve worked with vendors providing colour emoji glyphs and black-and-white emoji lists. Implementation is time-consuming and expensive. I just want standardized variation sequences for chess notation so that chess fonts can be sorted out. >> Vendors via the UTC look at symbol and pictograph and other characters and decide if they want to give these symbols and pictographs and other characters the special characteristic which implies generally colour rendering and implies an obligation to supply input methods for those characters. > > Yes, Unicode's emojification process is still seriously broken. It's not an argument against reusing the underlying techniques, though. I said it once already. Now I?m saying it again. Only the UTC assigns the emoji category to symbols. I?m not asking, and am not going to ask, the UTC to assign the emoji category to chess symbols. >> Please stop trying to conflate emoji and chess characters. It is NOT, I think, a solution which the UTC would agree to. I would oppose it in SC2. > > That's why I'm trying to convince you (but not just you) in this early stage. I?m not convinced and I?m not going to be convinced. The emoji VS would not solve the problem I have in any case. I need two VS characters, one for light squares and one for dark squares and the emoji VS only say ?you can make it colourful?. Emojification of chess characters is not the correct solution to the problem. >>> In all existing implementations they are. >> >> That?s not true. > > Could you please provide a counter example? I?ve seen chess fonts that have free-standing chesspiece characters as well as chess characters on light squares and dark ones. > *Emojis* are always square. I didn't say anything about fonts used for chess diagrams here. Square also does not mean ?em-square sized? which is pretty much what you need for chess diagrams. That?s all. Michael Everson From aleksey.tulinov at gmail.com Mon Apr 10 11:01:32 2017 From: aleksey.tulinov at gmail.com (Aleksey Tulinov) Date: Mon, 10 Apr 2017 19:01:32 +0300 Subject: Unicode vs. Unikod In-Reply-To: <86shlgk2pf.fsf@mimuw.edu.pl> References: <86shlgk2pf.fsf@mimuw.edu.pl> Message-ID: <2a8639a0-2753-4c49-bf44-ba65b9bb6f2b@gmail.com> On 04/10/2017 12:54 PM, Janusz S. Bie? wrote: > Grammatical Dictionary of Polish contains only "unicode": > > http://sgjp.pl/leksemy/#73537/unicode > I'm deeply impressed that dictionary contains grammar for registered trademarks. Google and Microsoft are also there, but not Oracle. I'm not confident i understand how that works. To compare to Cambridge Dictionary: http://dictionary.cambridge.org/dictionary/english/google?fallbackFrom=british-grammar Apparently "google" is a verb as in "to google", and this is why it's in the dictionary, but "Microsoft" and "Unicode" are missing. > This is probably the reason why, to my surprise, the word was > introduced also in some other Slavonic languages, e.g. > https://en.wiktionary.org/wiki/Unikod. > I believe "??????" in Russian is just a foreign word adopted by language, and it's a russism, it's a way of saying "Unicode" in Russian (phonetically the same). So apparently word "??????" was adopted as a noun, similarly to how "google" was adopted in English as a verb. From petercon at microsoft.com Mon Apr 10 11:30:29 2017 From: petercon at microsoft.com (Peter Constable) Date: Mon, 10 Apr 2017 16:30:29 +0000 Subject: Coloured Punctuation and Annotation In-Reply-To: <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> Message-ID: From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Asmus Freytag Sent: Wednesday, April 5, 2017 5:30 PM >> There are certainly MSS (in many languages) where some punctuation made of dots have some of the dots red and some black. > Agreed, those would be a challenge to reproduce with standard font technology and in plain text. Not at all. This capability has existed in all major OS platforms for some years now. It is what has enabled the growth of interest in Unicode emoji, but it is by no means limited to Unicode emoji: it can be used for multi-color rendering of any text in ways defined within a font. The OpenType spec supports this through a few techniques: - Decomposing a glyph into several glyphs that are layered (z-ordered) with colour assignments. - Glyphs expressed as embedded colour bitmaps. - Glyphs expressed as embedded SVG. > But for the same reason, they are out of scope for plain text (and therefore a bit irrelevant to the current discussion). I agree, the rendering aspect is completely orthogonal to Unicode plain-text encoding. Peter From verdy_p at wanadoo.fr Mon Apr 10 12:14:14 2017 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Mon, 10 Apr 2017 19:14:14 +0200 Subject: Unicode vs. Unikod In-Reply-To: <2a8639a0-2753-4c49-bf44-ba65b9bb6f2b@gmail.com> References: <86shlgk2pf.fsf@mimuw.edu.pl> <2a8639a0-2753-4c49-bf44-ba65b9bb6f2b@gmail.com> Message-ID: I've seen "unicod?" or "uniencod?" used informally in various French articles or discussions (but not in dictionnaries). But after all Apple is also a trademark, this does not restrict people using it for the fruit. Trademarks often reserve common words for use with specific products or company names in some country in some registered activities, this doesnot mean they take rights on everything or even in every place where they were also legally registered (possibly in the same domain of activity). "Unicoding" (and related verb forms without the necessary leading capital) can legitimately be found to just refer to the UCS or the ISO 10646 standard, not just the "Unicode Consortium" and its standard(s), activities or domain name/web site, or any derived application based on the UCS. There's some freedom here, even if one cannot use it freely to refer to another organization anyway the term "Unicode" is now wellknown in lots of languages. It's also natural that people want ot rewrite it in their native script. I just wonder why the Consortium did not document at least some correct orthography for use in other script than Latin, even if these alternate names are not registered. However there's no need to document variant orthographies such as "Unikod" which may be used in some other Latin-written language. There should be such listed terms in other scripts with at least Cyrillic, Greek, Georgian, Armenian, Ethiopic, Arabic, Hangul, Kanas and possibly Bopomofo (I wonder if there's any way to write it with Han sinograms by composing a radical and phonetic strokes). Even if these terms are not "standardized" and really supported, it would be convenient to find some external references (even if they are not fully conforming or criticizing some existing problems), just to know what other people are doing with the standard and how open it is really, even for fancy uses. As this standard wants to be universal, people will naturally challenge this openness and will want to reappropriate it partly. This is not a defect but a consequence of the fact that this standard is vivid, productful and can even accept some innovations and remain evolutive and attractive. 2017-04-10 18:01 GMT+02:00 Aleksey Tulinov : > On 04/10/2017 12:54 PM, Janusz S. Bie? wrote: > > > Grammatical Dictionary of Polish contains only "unicode": > > > > http://sgjp.pl/leksemy/#73537/unicode > > > > I'm deeply impressed that dictionary contains grammar for registered > trademarks. Google and Microsoft are also there, but not Oracle. I'm not > confident i understand how that works. > > To compare to Cambridge Dictionary: > > http://dictionary.cambridge.org/dictionary/english/google?fa > llbackFrom=british-grammar > > Apparently "google" is a verb as in "to google", and this is why it's in > the dictionary, but "Microsoft" and "Unicode" are missing. > > > This is probably the reason why, to my surprise, the word was > > introduced also in some other Slavonic languages, e.g. > > https://en.wiktionary.org/wiki/Unikod. > > > > I believe "??????" in Russian is just a foreign word adopted by language, > and it's a russism, it's a way of saying "Unicode" in Russian (phonetically > the same). So apparently word "??????" was adopted as a noun, similarly to > how "google" was adopted in English as a verb. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Apr 10 13:32:21 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 10 Apr 2017 18:32:21 +0000 Subject: Coloured Punctuation and Annotation In-Reply-To: <068DC35F-4A31-44FA-9EA7-ADDDDE06D450@evertype.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> <068DC35F-4A31-44FA-9EA7-ADDDDE06D450@evertype.com> Message-ID: Michael, your two-tone effect can easily be added into your first font using COLR and CPAL tables, so that the one font can support a monochrome rendering that uses glyphs in which the swirls are fused with the letters, and can also support a poly-chrome rendering in which those glyphs are decomposed into separate glyphs that get layered on top of one another in an order you specify with different RGBA colours. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Michael Everson Sent: Thursday, April 6, 2017 5:41 AM To: unicode Unicode Discussion Subject: Re: Coloured Punctuation and Annotation > On 6 Apr 2017, at 05:41, Richard Wordingham wrote: > > On Thu, 6 Apr 2017 01:11:09 +0100 > Michael Everson wrote: > >> On 5 Apr 2017, at 22:48, Richard Wordingham >> wrote: >> >>> I tried to read it from UTS#51 ?Unicode Emoji', which is not part of TUS, but I couldn't deduce that a font that enables U+10B99 PSALTER PAHLAVI SECTION MARK to have exactly two (as opposed to none or four) red dots is in breach of the guidelines therein. >> >> Kindly explain how ANY font could do this. > > Is this a trick question? No. Here is an example of a font available in two variants. In one variant, all those grey swirls are fused to the letters, and it can all be printed in black or one colour ink. https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcdn.myfonts.net%2Fs%2Faw%2Foriginal%2F255%2F0%2F131020.png&data=02%7C01%7Cpetercon%40microsoft.com%7Cd423eda2387c475363ef08d47ceb4b80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636270797424696444&sdata=%2F64giVqctMwconsQVzFvIj7WPbOzNeQ%2F6npJUlIXaTc%3D&reserved=0 There is also a second set of fonts included which separates the swirls from the letters, and those can be used in typesetting to get the two-colour effect you see here. That can?t really be done using standard encoding. You?d probably see IIVVOORRYY in the backing store for that word, with every other letter being set in the letter font and the swirl font. Emoji-style colour fonts use other mechanisms for colour. Michael Everson From unicode at unicode.org Mon Apr 10 13:21:36 2017 From: unicode at unicode.org (Sarasvati via Unicode) Date: Mon, 10 Apr 2017 13:21:36 -0500 Subject: Header changes for the Unicode Mail List Message-ID: <201704101821.v3AILaPm012540@sarasvati.unicode.org> Hello everyone, You may notice a change in the way mail headers are handled for this mail list, and it might affect Reply and Reply All functionality in whatever client you are using to read and respond to e-mail. This change is related to DMARC handling on some sites, and how that inter-operates with mail list software. Optional reading: https://dmarc.org/ https://www.google.com/?hl=en&gws_rd=ssl#hl=en&q=dmarc+and+mailing+lists From unicode at unicode.org Mon Apr 10 13:56:09 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Mon, 10 Apr 2017 19:56:09 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <9FCA9B1F-00D7-459E-8567-59589609A708@evertype.com> <20170406054107.20e40bd5@JRWUBU2> <068DC35F-4A31-44FA-9EA7-ADDDDE06D450@evertype.com> Message-ID: <8B7DE90F-026F-471C-A7DD-A5DBEB70485B@evertype.com> On 10 Apr 2017, at 19:32, Peter Constable via Unicode wrote: > > Michael, your two-tone effect can easily be added into your first font using COLR and CPAL tables, so that the one font can support a monochrome rendering that uses glyphs in which the swirls are fused with the letters, and can also support a poly-chrome rendering in which those glyphs are decomposed into separate glyphs that get layered on top of one another in an order you specify with different RGBA colours. Thank you, but I don?t have any need to represent chess diagram glyphs in colour. I also don?t have any font editing tools which edit COLR and CPAL tables in colour. Michael From unicode at unicode.org Mon Apr 10 14:00:53 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Mon, 10 Apr 2017 20:00:53 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> Message-ID: <44CC55B7-D9E5-45D9-BB43-809E2DF43CD7@evertype.com> On 10 Apr 2017, at 17:30, Peter Constable wrote: Sorry, Peter. I didn?t realize you weren?t talking about chess fonts. Michael From unicode at unicode.org Mon Apr 10 15:39:04 2017 From: unicode at unicode.org (Asmus Freytag (c) via Unicode) Date: Mon, 10 Apr 2017 13:39:04 -0700 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> Message-ID: <3d6a9198-782b-7966-c057-de8036835c33@ix.netcom.com> On 4/10/2017 9:30 AM, Peter Constable wrote: > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Asmus Freytag > Sent: Wednesday, April 5, 2017 5:30 PM > >>> There are certainly MSS (in many languages) where some punctuation made of dots have some of the dots red and some black. >> Agreed, those would be a challenge to reproduce with standard font technology and in plain text. > Not at all. This capability has existed in all major OS platforms for some years now. It may be in the platforms, but of the few clients I've tried this with, only one is reliably supporting this. > It is what has enabled the growth of interest in Unicode emoji, but it is by no means limited to Unicode emoji: it can be used for multi-color rendering of any text in ways defined within a font. The OpenType spec supports this through a few techniques: > > - Decomposing a glyph into several glyphs that are layered (z-ordered) with colour assignments. > - Glyphs expressed as embedded colour bitmaps. > - Glyphs expressed as embedded SVG. Khaled gave a very nice demonstration of that on this list (which allowed me to test this). > >> But for the same reason, they are out of scope for plain text (and therefore a bit irrelevant to the current discussion). > I agree, the rendering aspect is completely orthogonal to Unicode plain-text encoding. The problem with multicolored fonts would be the integration into font color selection via styling. http://www.amirifont.org/fatiha-colored.html If you select a section of this text, the black ink will invert as you select it, but the other colors remain the same, which is different from selecting a multicolored image or different from selecting multiple runs of fonts in different colors. I wonder whether high-end tools like Indesign would be able to allow styling of individual color levels. For rendering emoji colors via fonts that wouldn't matter, but for the kind of annotated text example, it could be interesting to be able to tweak these layer colors. A./ > > > Peter From unicode at unicode.org Mon Apr 10 16:56:02 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 10 Apr 2017 21:56:02 +0000 Subject: Coloured Punctuation and Annotation In-Reply-To: <3d6a9198-782b-7966-c057-de8036835c33@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <20170403203355.6cbfc184@JRWUBU2> <20170405091030.116883a5@JRWUBU2> <20170405224817.4149f845@JRWUBU2> <6883042b-8247-4c89-a47c-eb699cd6e590@ix.netcom.com> <0D5792F5-0D5E-47F4-BA9B-6FB4BC555BAA@evertype.com> <4d6aa681-3731-b646-43bf-cc37a3b930de@ix.netcom.com> <3d6a9198-782b-7966-c057-de8036835c33@ix.netcom.com> Message-ID: The color palette entries (CPAL) used for COLR or SVG can potentially be customized by an application ? whether for user customization or to fit some context (such as selection). Peter -----Original Message----- From: Asmus Freytag (c) [mailto:asmusf at ix.netcom.com] Sent: Monday, April 10, 2017 1:39 PM To: Peter Constable ; unicode at unicode.org Subject: Re: Coloured Punctuation and Annotation On 4/10/2017 9:30 AM, Peter Constable wrote: > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Asmus > Freytag > Sent: Wednesday, April 5, 2017 5:30 PM > >>> There are certainly MSS (in many languages) where some punctuation made of dots have some of the dots red and some black. >> Agreed, those would be a challenge to reproduce with standard font technology and in plain text. > Not at all. This capability has existed in all major OS platforms for some years now. It may be in the platforms, but of the few clients I've tried this with, only one is reliably supporting this. > It is what has enabled the growth of interest in Unicode emoji, but it is by no means limited to Unicode emoji: it can be used for multi-color rendering of any text in ways defined within a font. The OpenType spec supports this through a few techniques: > > - Decomposing a glyph into several glyphs that are layered (z-ordered) with colour assignments. > - Glyphs expressed as embedded colour bitmaps. > - Glyphs expressed as embedded SVG. Khaled gave a very nice demonstration of that on this list (which allowed me to test this). > >> But for the same reason, they are out of scope for plain text (and therefore a bit irrelevant to the current discussion). > I agree, the rendering aspect is completely orthogonal to Unicode plain-text encoding. The problem with multicolored fonts would be the integration into font color selection via styling. https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amirifont.org%2Ffatiha-colored.html&data=02%7C01%7Cpetercon%40microsoft.com%7C296e985cc48947a9f25808d48051a201%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636274535476985547&sdata=%2Bj4%2FEA9RA8j4iIjTZYBrGH36BbSpenxKfOFy5uvWyXs%3D&reserved=0 If you select a section of this text, the black ink will invert as you select it, but the other colors remain the same, which is different from selecting a multicolored image or different from selecting multiple runs of fonts in different colors. I wonder whether high-end tools like Indesign would be able to allow styling of individual color levels. For rendering emoji colors via fonts that wouldn't matter, but for the kind of annotated text example, it could be interesting to be able to tweak these layer colors. A./ > > > Peter From unicode at unicode.org Mon Apr 10 17:08:14 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 10 Apr 2017 22:08:14 +0000 Subject: Coloured Punctuation and Annotation In-Reply-To: <4202283.53811.1491503991391.JavaMail.defaultUser@defaultHost> References: <4202283.53811.1491503991391.JavaMail.defaultUser@defaultHost> Message-ID: William: Michael's scenario doesn't require a special palette index value such as you propose since (i) he could implement a font with alternate palettes to provide different colouring options of his choosing, and (ii) an app can always expose customization options to allow the user to customize any of the palette entries that are being used, even on a character-by-character basis if the app really wanted to. Moreover, defining palette index 0xFFFE with a special meaning would be a breaking change that could negatively impact existing implementations. Also, it would create a potential ambiguity about what colour to use: whereas text drawing operations _always_ have a foreground colour specified, there is no convention for specifying a "first decoration colour". For these reasons, this is not going to happen. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of William_J_G Overington Sent: Thursday, April 6, 2017 11:40 AM To: everson at evertype.com; richard.wordingham at ntlworld.com; unicode at unicode.org Subject: Re: Coloured Punctuation and Annotation Michael Everson wrote: > No. Here is an example of a font available in two variants. In one variant, all those grey swirls are fused to the letters, and it can all be printed in black or one colour ink. > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcdn.myfonts.net%2Fs%2Faw%2Foriginal%2F255%2F0%2F131020.png&data=02%7C01%7Cpetercon%40microsoft.com%7C99523bf7480842d3096708d47d1ecae7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636271018606863669&sdata=7r1pdkH%2BGDjMDxhw44fxfwXjQ6IU%2FUXZntejzC5npm4%3D&reserved=0 > There is also a second set of fonts included which separates the swirls from the letters, and those can be used in typesetting to get the two-colour effect you see here. That can?t really be done using standard encoding. You?d probably see IIVVOORRYY in the backing store for that word, with every other letter being set in the letter font and the swirl font. Richard Wordingham mentioned the following. > The third glyph would use 'index' 0xFFFF to specify that it be displayed in the foreground colour. If the OpenType specification were augmented so that 'index' 0xFFFE were to specify that the appropriate part of the glyph be displayed in the "first decoration colour", a colour specified in the application program and not in the font; and an application program were augmented so that an end user were able to choose first decoration colour as well as choosing foreground colour, then would that produce the result for which Michael is looking? William Overington Thursday 6 April 2017 From unicode at unicode.org Mon Apr 10 17:10:39 2017 From: unicode at unicode.org (Aleksey Tulinov via Unicode) Date: Tue, 11 Apr 2017 01:10:39 +0300 Subject: Unicode vs. Unikod In-Reply-To: References: <86shlgk2pf.fsf@mimuw.edu.pl> <2a8639a0-2753-4c49-bf44-ba65b9bb6f2b@gmail.com> Message-ID: On 04/10/2017 08:14 PM, Philippe Verdy wrote: > "Unicoding" (and related verb forms without the necessary leading > capital) can legitimately be found to just refer to the UCS or the ISO > 10646 standard, not just the "Unicode Consortium" and its standard(s), > activities or domain name/web site, or any derived application based on > the UCS. > > There's some freedom here, even if one cannot use it freely to refer > to another organization anyway the term "Unicode" is now wellknown in > lots of languages. It's also natural that people want ot rewrite it > in their native script. > It's hard to use foreign word in language until word is adopted. Russians don't do "ing", there are different rules in the language, so first goes adopting to "??????": most notably, there is no vowel at the end of the word. Then this word can be transformed into something different, e.g. "?????????" (verb, similar to "to unicode"). I don't think it's just a desire to rewrite a word in native script, it's how Russian language works, it not just a matter of spelling. "??????" is a Russian word, it's not just Cyrillic, it belongs to the Russian language, it does follow Russian language rules (word "Unicode" in Latin doesn't). > I just wonder why the Consortium did not document at least some > correct orthography for use in other script than Latin, even if these > alternate names are not registered. > It's probably this link: http://unicode.org/standard/UnicodeTranscriptions.html It says "??????" in Russian, which is fine. But Russian translation of "What is Unicode" (http://www.unicode.org/standard/translations/russian.html) uses original word "Unicode", and that's also fine. Both words means the same thing, it's all good. From unicode at unicode.org Mon Apr 10 17:13:05 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Mon, 10 Apr 2017 23:13:05 +0100 Subject: Coloured Punctuation and Annotation In-Reply-To: References: <4202283.53811.1491503991391.JavaMail.defaultUser@defaultHost> Message-ID: Michael isn?t trying to make any coloured fonts. Michael > On 10 Apr 2017, at 23:08, Peter Constable via Unicode wrote: > > William: > > Michael's scenario doesn't require a special palette index value such as you propose since (i) he could implement a font with alternate palettes to provide different colouring options of his choosing, and (ii) an app can always expose customization options to allow the user to customize any of the palette entries that are being used, even on a character-by-character basis if the app really wanted to. > > Moreover, defining palette index 0xFFFE with a special meaning would be a breaking change that could negatively impact existing implementations. Also, it would create a potential ambiguity about what colour to use: whereas text drawing operations _always_ have a foreground colour specified, there is no convention for specifying a "first decoration colour". > > For these reasons, this is not going to happen. > > > Peter > > > -----Original Message----- > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of William_J_G Overington > Sent: Thursday, April 6, 2017 11:40 AM > To: everson at evertype.com; richard.wordingham at ntlworld.com; unicode at unicode.org > Subject: Re: Coloured Punctuation and Annotation > > Michael Everson wrote: > >> No. Here is an example of a font available in two variants. In one variant, all those grey swirls are fused to the letters, and it can all be printed in black or one colour ink. > >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcdn.myfonts.net%2Fs%2Faw%2Foriginal%2F255%2F0%2F131020.png&data=02%7C01%7Cpetercon%40microsoft.com%7C99523bf7480842d3096708d47d1ecae7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636271018606863669&sdata=7r1pdkH%2BGDjMDxhw44fxfwXjQ6IU%2FUXZntejzC5npm4%3D&reserved=0 > >> There is also a second set of fonts included which separates the swirls from the letters, and those can be used in typesetting to get the two-colour effect you see here. That can?t really be done using standard encoding. You?d probably see IIVVOORRYY in the backing store for that word, with every other letter being set in the letter font and the swirl font. > > Richard Wordingham mentioned the following. > >> The third glyph would use 'index' 0xFFFF to specify that it be displayed in the foreground colour. > > If the OpenType specification were augmented so that 'index' 0xFFFE were to specify that the appropriate part of the glyph be displayed in the "first decoration colour", a colour specified in the application program and not in the font; and an application program were augmented so that an end user were able to choose first decoration colour as well as choosing foreground colour, then would that produce the result for which Michael is looking? > > William Overington > > Thursday 6 April 2017 > > > > From unicode at unicode.org Mon Apr 10 18:43:12 2017 From: unicode at unicode.org (Ben Morphett via Unicode) Date: Mon, 10 Apr 2017 23:43:12 +0000 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <04fa59f3-e249-ab6d-b08e-64bf58b3ff07@ix.netcom.com> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <43406988-1699-dbaa-89fc-da37abe796fd@ix.netcom.com> <39404893-7190-4B8A-94B4-A0549EDDCC8E@evertype.com> <04fa59f3-e249-ab6d-b08e-64bf58b3ff07@ix.netcom.com> Message-ID: Oh look, I was just waiting for someone to mention Adolf Hitler, and then for someone to invoke Godwin?s observation, and for the whole thing to go up in a puff of smoke. Whew. ?? From: Asmus Freytag [mailto:asmusf at ix.netcom.com] Sent: Sunday, April 9, 2017 7:24 AM To: unicode at unicode.org Subject: Re: Proposal to add standardized variation sequences for chess notation On 4/8/2017 12:20 PM, Michael Everson wrote: I can quote your own message just posted 3 hours ago? YOU REALLY USED the term "game" and wanted developers to use fonts for them. Please learn to read. Time for Sarasvati to pull the plug on this thread? A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Apr 10 19:18:48 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Tue, 11 Apr 2017 00:18:48 +0000 Subject: PETSCII mapping? In-Reply-To: References: <38d70a68-aabe-a6d1-50cf-cbdf2f92b88f@ix.netcom.com> Message-ID: From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Rebecca T Sent: Wednesday, April 5, 2017 2:26 PM > As time goes on, ?not in widespread use? will become a flimsier and flimsier > argument against inclusion ? why isn?t there a larger community of PETSCII > enthusaists? Partially because the only way to share PETSCII is through images! > The consortium (passively or actively) prevents communication through exclusion > and then uses the lack of communication as a justification against inclusion ? > it?s a poor, tautological argument, and it won?t serve the consortium > long-term. > > Simply put, we need new criteria for inclusion? Your assertions are based on assumptions that simply aren?t valid. Unicode regularly encodes characters for things that are not in widespread use, and that fit the intended scope of the Standard. If someone can demonstrate that there are users who _would_ interchange texts that currently cannot be represented in Unicode for lack of appropriate characters, then that certainly can be considered. But the fact that some text element was represented in some legacy system does not alone comprise an adequate basis for encoding. And as Asmus said elsewhere in this thread, > Nothing gets decided by the UTC unless there's a proposal on the table. Also, as Elias said, > Wouldn't it make sense to get in touch with active Commodore 64 communities > to find out how people deal with this today? This is key: if there isn?t an on-going interest among some user community for interchanging the putative characters in Unicode, then that would weaken a case for encoding. > we must weigh a character?s merits and usability on its own. (does > it fill a gap in communication? Will it be used?) That is already and has always been a basis on which characters get encoded in Unicode. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Apr 10 23:22:52 2017 From: unicode at unicode.org (Jonathan Rosenne via Unicode) Date: Tue, 11 Apr 2017 04:22:52 +0000 Subject: Unicode vs. Unikod In-Reply-To: References: <86shlgk2pf.fsf@mimuw.edu.pl> <2a8639a0-2753-4c49-bf44-ba65b9bb6f2b@gmail.com> Message-ID: Regarding http://unicode.org/standard/UnicodeTranscriptions.html the Hebrew (pointed) is wrong, the Holam point should be above the Vav. Attached are a word and pdf documents that appear correct on my computer, and a png. Best Regards, Jonathan Rosenne -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Aleksey Tulinov via Unicode Sent: Tuesday, April 11, 2017 1:11 AM To: verdy_p at wanadoo.fr Cc: unicode Unicode Discussion Subject: Re: Unicode vs. Unikod On 04/10/2017 08:14 PM, Philippe Verdy wrote: > "Unicoding" (and related verb forms without the necessary leading > capital) can legitimately be found to just refer to the UCS or the ISO > 10646 standard, not just the "Unicode Consortium" and its standard(s), > activities or domain name/web site, or any derived application based > on the UCS. > > There's some freedom here, even if one cannot use it freely to refer > to another organization anyway the term "Unicode" is now wellknown in > lots of languages. It's also natural that people want ot rewrite it > in their native script. > It's hard to use foreign word in language until word is adopted. Russians don't do "ing", there are different rules in the language, so first goes adopting to "??????": most notably, there is no vowel at the end of the word. Then this word can be transformed into something different, e.g. "?????????" (verb, similar to "to unicode"). I don't think it's just a desire to rewrite a word in native script, it's how Russian language works, it not just a matter of spelling. "??????" is a Russian word, it's not just Cyrillic, it belongs to the Russian language, it does follow Russian language rules (word "Unicode" in Latin doesn't). > I just wonder why the Consortium did not document at least some > correct orthography for use in other script than Latin, even if these > alternate names are not registered. > It's probably this link: http://unicode.org/standard/UnicodeTranscriptions.html It says "??????" in Russian, which is fine. But Russian translation of "What is Unicode" (http://www.unicode.org/standard/translations/russian.html) uses original word "Unicode", and that's also fine. Both words means the same thing, it's all good. -------------- next part -------------- A non-text attachment was scrubbed... Name: Unicode (Pointed Hebrew).pdf Type: application/pdf Size: 8649 bytes Desc: Unicode (Pointed Hebrew).pdf URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Unicode (Pointed Hebrew).png Type: image/png Size: 615 bytes Desc: Unicode (Pointed Hebrew).png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Unicode (Pointed Hebrew).docx Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document Size: 12973 bytes Desc: Unicode (Pointed Hebrew).docx URL: From unicode at unicode.org Tue Apr 11 00:23:27 2017 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Tue, 11 Apr 2017 14:23:27 +0900 Subject: Unicode vs. Unikod In-Reply-To: <86shlgk2pf.fsf@mimuw.edu.pl> References: <86shlgk2pf.fsf@mimuw.edu.pl> Message-ID: Hello Janusz, I think you should report this problem to http://www.unicode.org/reporting.html. That way, it gets tracked appropriately. This list is for discussion, not for bug fixes. Regards, Martin. On 2017/04/10 18:54, Janusz S. Bie? wrote: > > This is a long overdue issue, but better late than never. > > To make a long story short, I think that the word "Unikod" should not > be used in the Polish translation of "What is Unicode": > > http://www.unicode.org/standard/translations/polish.html > > The word "Unikod", to the best of my knowledge, has been coined long > ago by Piotr Trzcionkowski, who registered also the domain > www.unikod.pl used to advocate Unicode in Poland. > > "Unicode" as a trademark should not be translated, for me this is > quite obvious. > > This is actually the case in almost all language versions of "What is > Unicode" using the Latin script (with the exception of Esperanto and > Lithuanian, where it can be probably justified by some grammar rules, > and Polish). This is also seems to be the case in various language > versions of Wikipedia (I've checked only some of them) with the > exception of the Polish one which uses "Unikod" as the primary entry. > > The occurence of "Unikod" on the Unicode site may be interpreted as an > official acceptance of this equivalent. I hope this is not the case. I > would like to clarify the matter before engaging myself in a > discussion about introducing the "Unicode" primary entry in Polish > Wikipedia. > > You can check the usage of "Unicode" and "Unikod" in Polish not only > with Google but also in the National Corpus of Polish: > > http://nkjp.pl/ > > There are 786 occurences of "Unicode" coming mainly from published > books and 102 occurences of "Unikod", mainly in Usenet postings and > Wikipedia discussions. > > Grammatical Dictionary of Polish contains only "unicode": > > http://sgjp.pl/leksemy/#73537/unicode > > Polish versions of Windows use "Unicode". A localization dictionary > > http://www.btinfodictionary.com/ > > also preserves "Unicode". > > Actually I don't mind using "Unikod" (or better "unikod") informally, > as e.g. spelling "Unikodem" is simpler than "Unicode'em" (instrumental > singular) and "Unikodzie" looks better in Polish than "Unicodzie" > (locative singular), moreover there is no doubt how to pronounce it. > This is probably the reason why, to my surprise, the word was > introduced also in some other Slavonic languages, e.g. > https://en.wiktionary.org/wiki/Unikod. > > My point is only that both "What is Unicode" and Polish Wikipedia > primary entry should use the original spelling. > > Best regards > > Janusz > -- Prof. Dr.sc. Martin J. D?rst Department of Intelligent Information Technology College of Science and Engineering Aoyama Gakuin University Fuchinobe 5-1-10, Chuo-ku, Sagamihara 252-5258 Japan From unicode at unicode.org Tue Apr 11 05:29:25 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 11 Apr 2017 12:29:25 +0200 Subject: Unicode vs. Unikod In-Reply-To: References: <86shlgk2pf.fsf@mimuw.edu.pl> <2a8639a0-2753-4c49-bf44-ba65b9bb6f2b@gmail.com> Message-ID: 2017-04-11 0:10 GMT+02:00 Aleksey Tulinov : > It's probably this link: http://unicode.org/standard/Un > icodeTranscriptions.html This page is hard to find, I didn't know where it was linked from until I saw it (referenced by "What is Unicode?") -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Apr 11 08:04:55 2017 From: unicode at unicode.org (Kent Karlsson via Unicode) Date: Tue, 11 Apr 2017 15:04:55 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: Den 2017-04-10 12:19, skrev "Michael Everson" : > I believe the box drawing characters are for drawing boxes Which is exactly what you are doing. > and grids on > computer terminals, which is not the same thing as scoring a line around a set > of 64 graphic images. No, that is why I put in variation selectors. The glyphic variation selected would in my judgement fall well within the "box drawing semantics" (if you like) of these characters. In addition, thinking ahead, it is not at all unlikely that someone might want to divide a chess board with a horizontal mid-line, or for that matter a vertical mid-line (e.g. for "double chess"), or even quadrants. And then, ta-da, there are already box-drawing characters for doing just that (even when there is a small gap between the board and the border. (I'm not suggesting to add variation selector sequences for /those/ box drawing characters, because I don't /know/ there is a use-case for mid-lines in chess board layout, but I'm saying there might be.) > I don?t want to get mixed up in using the box-drawing > characters. The characters which I have chosen work fine and to my mind suit > the application better. They "work" (of course), no font renderer or font editor is "smart" enough to "see" that you are going quite a bit (in my judgement) outside of the acceptable glyph variability for the characters you (so far) opted for for chess box drawing. (Other relevant, and non-glyph, properties being the same between the box drawing and block chars.) That the "block characters" are pure crap (which they are), does not mean that you can co-opt them for (slightly) "variant" box drawing. > I also don?t want to complicate chess fonts by having to have multiple choices > within a font for bordering. For one thing, single-rule and double-rule > bordering is by no means the gamut of possibility. You are not wanting "emoji" style borders, I'm sure. But some slight "ornate" style would be fine for the "box drawing" chars (even without variation selectors). The "single" should still be single, though, and the "double" be double. So triple (etc.) is out. I think single/double line border should be a decision by the "author"/ "editor", and not the font maker. Imagine accompanying text saying "the double bordered one is ". > Chess fonts do not have to be swiss-army knives. I don't see that I have asked for that. B.t.w., I see you don't have 1-8, a-h labels on the boards... It might be worth mentioning that FULLWIDTH a-h should work fine as labels (them being em-wide). /Kent K From unicode at unicode.org Tue Apr 11 08:19:29 2017 From: unicode at unicode.org (William_J_G Overington via Unicode) Date: Tue, 11 Apr 2017 14:19:29 +0100 (BST) Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: <10671671.8753.1491645811693.JavaMail.defaultUser@defaultHost> References: <4783FDDC-4F0B-4FE2-ABCA-09A09884C011@evertype.com> <20170402095322.17526d87@JRWUBU2> <16625622-2E1B-4053-A428-CAF97F7916F3@evertype.com> <20170402172710.54c37ad2@JRWUBU2> <2e5750ee-c110-2b15-7e7e-cfc166167ba8@ix.netcom.com> <7A9A7F35-3F4E-4C38-AA36-136399111271@evertype.com> <742647d6-75f8-2f59-4b60-75a67ea73572@ix.netcom.com> <5062E7FE-57DA-49A7-89C1-776D6CDE2E61@evertype.com> <915358C1-319D-4494-A915-2FAA557F8840@evertype.com> <59398711.46744.1491481164274.JavaMail.open-xchange@app06.ox.hosteurope.de> <1070748515.52412.1491603430824.JavaMail.open-xchange@app06.ox.hosteurope.de> <3A6FCC31-F335-40EE-B842-F690E413D3B1@evertype.com> <10671671.8753.1491645811693.JavaMail.defaultUser@defaultHost> Message-ID: <5155490.32560.1491916769077.JavaMail.defaultUser@defaultHost> On Saturday 8 April 2017 I wrote: > I have made an OpenType font that implements Michael's proposed format and the extension of having variation selectors for the border units that Michael kindly added during the discussion. > I have published the font and the font is available, free, from the following forum thread. > http://forum.high-logic.com/viewtopic.php?f=10&t=7033 The font is Quest text 2017. I have now sent, as an email attachment, (a copy of) the font to the British Library for Legal Deposit and I have received an email Receipt of deposit from the British Library. This post in this mailing list is so that the continued conservation of the font becomes recorded in the archive of this mailing list. William Overington Tuesday 11 April 2017 From unicode at unicode.org Tue Apr 11 10:44:12 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 11 Apr 2017 17:44:12 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: 2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode : > > Den 2017-04-10 12:19, skrev "Michael Everson" : > > > I believe the box drawing characters are for drawing boxes > > Which is exactly what you are doing. > > > and grids on > > computer terminals, which is not the same thing as scoring a line around > a set > > of 64 graphic images. > > No, that is why I put in variation selectors. The glyphic variation > selected would in my judgement fall well within the "box drawing semantics" > (if you like) of these characters. Some Asian chess boards include also diagonal lines or dots on top of their crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups by such dots). These chess boards do not alternate white and black "squares" ; beside this, the cells may also be rectangular (longer vertically than horizontally) however such metric is not so important, as long as all cells have coherent sizes and can fit the pieces (which are flat like domino tiles, but not laid vertically on top of the table, and where pieces use symbols or sinograms instead of 3D head sculptures). I've not seen any box drawing character with such dots or all the needed diagonals. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Apr 11 22:14:51 2017 From: unicode at unicode.org (Garth Wallace via Unicode) Date: Tue, 11 Apr 2017 20:14:51 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: On Tue, Apr 11, 2017 at 6:04 AM, Kent Karlsson via Unicode < unicode at unicode.org> wrote: > > Den 2017-04-10 12:19, skrev "Michael Everson" : > > > I don?t want to get mixed up in using the box-drawing > > characters. The characters which I have chosen work fine and to my mind > suit > > the application better. > > They "work" (of course), no font renderer or font editor is "smart" enough > to "see" that you are going quite a bit (in my judgement) outside of the > acceptable glyph variability for the characters you (so far) opted for > for chess box drawing. (Other relevant, and non-glyph, properties being > the same between the box drawing and block chars.) > > That the "block characters" are pure crap (which they are), does not > mean that you can co-opt them for (slightly) "variant" box drawing. > > > I also don?t want to complicate chess fonts by having to have multiple > choices > > within a font for bordering. For one thing, single-rule and double-rule > > bordering is by no means the gamut of possibility. > > You are not wanting "emoji" style borders, I'm sure. But some slight > "ornate" style would be fine for the "box drawing" chars (even without > variation selectors). The "single" should still be single, though, > and the "double" be double. So triple (etc.) is out. > One salient feature the Block Elements have that the Box Drawing characters do not: distinct LEFT and RIGHT verticals, and LOWER and UPPER horizontals. The double frame typically consists of a thin line and a thicker line, with one on the inside and one on the outside, so left and right verticals are not interchangeable. Even when a single frame is used, it is important for spacing, since the frame should be flush against the board. I think single/double line border should be a decision by the "author"/ > "editor", and not the font maker. Imagine accompanying text saying > "the double bordered one is ". > I'm not aware of any usage like this, and while it's not impossible it doesn't seem likely since there are more common (and reliable) ways of referring to specific diagrams, such as numbering. In my experience the choice of single or double lined frames is a matter of style. The presence or lack of frame elements, however, can be semantic. It is common for "vertical cylinder" boards in fairy chess, for example, to lack the left and right frame elements to show that there is no barrier there (the first and last files are treated as adjacent). Likewise, the (less common) "horizontal cylinder" lacks top and bottom frame elements, and the "anchor ring" (torus) lacks any frame at all. B.t.w., I see you don't have 1-8, a-h labels on the boards... It might be > worth mentioning that FULLWIDTH a-h should work fine as labels (them being > em-wide). > That would work. I'm not sure it's necessary to mention. A great many diagrams are not labeled at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Apr 11 23:12:24 2017 From: unicode at unicode.org (Garth Wallace via Unicode) Date: Tue, 11 Apr 2017 21:12:24 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: On Tue, Apr 11, 2017 at 8:44 AM, Philippe Verdy via Unicode < unicode at unicode.org> wrote: > > > 2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode > : > >> >> Den 2017-04-10 12:19, skrev "Michael Everson" : >> >> > I believe the box drawing characters are for drawing boxes >> >> Which is exactly what you are doing. >> >> > and grids on >> > computer terminals, which is not the same thing as scoring a line >> around a set >> > of 64 graphic images. >> >> No, that is why I put in variation selectors. The glyphic variation >> selected would in my judgement fall well within the "box drawing >> semantics" >> (if you like) of these characters. > > > Some Asian chess boards include also diagonal lines or dots on top of > their crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups > by such dots). These chess boards do not alternate white and black > "squares" ; beside this, the cells may also be rectangular (longer > vertically than horizontally) however such metric is not so important, as > long as all cells have coherent sizes and can fit the pieces (which are > flat like domino tiles, but not laid vertically on top of the table, and > where pieces use symbols or sinograms instead of 3D head sculptures). > Shogi and Xiangqi diagrams are very different from Western chess diagrams, and necessarily outside of the scope of this proposal. There is no unified solution because the problems to be solved are different. Shogi diagrams are uncheckered (as Shogi boards are), with grid-lines to separate the spaces; traditionally, chess diagrams use the contrast of dark and light squares to distinguish spaces with no grid lines. They may, but do not have to, have dots at some intersections (these mark starting and promotion zones). Graphical diagrams may show images of pieces (pentagonal, with names written in kanji), but typeset diagrams use abbreviations of the piece names as CJK ideographs or kana: e.g. the gold general is ?, and the promoted pawn is ?. Instead of "black" and "white", the pieces belonging to the sente player are displayed upright and those belonging to the gote player are rotated 180?. Any proposal for Shogi would have to deal with that. Xiangqi diagrams are even less like chess diagrams, since Xiangqi is played on the intersections of the board grid, not the spaces. This (and the closely related Korean game of Janggi) is the one you're thinking of with diagonals. Pieces are represented by CJK ideographs in circles (sometimes octagons in the case of Janggi). The primary issue for typesetting Western chess diagrams is dark squares with and without piece symbols. This issue is irrelevant to East Asian chesslike games. I've not seen any box drawing character with such dots or all the needed > diagonals. > China included some extensions to the Box Drawing characters in an early proposal for Xiangqi characters < http://www.unicode.org/L2/L2010/10368-n3910.pdf>. AIUI that was shot down < http://www.unicode.org/L2/L2010/10463-chinese-chess.pdf>. China seems to have dropped the matter. Later Xiangqi proposals by Andrew West focused on the circled ideographs and did not pursue new diagram drawing characters, and were eventually successful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Apr 11 23:58:22 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 12 Apr 2017 06:58:22 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: 2017-04-12 6:12 GMT+02:00 Garth Wallace : > On Tue, Apr 11, 2017 at 8:44 AM, Philippe Verdy via Unicode < > unicode at unicode.org> wrote: > >> >> >> 2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode > >: >> >>> >>> Den 2017-04-10 12:19, skrev "Michael Everson" : >>> >>> > I believe the box drawing characters are for drawing boxes >>> >>> Which is exactly what you are doing. >>> >>> > and grids on >>> > computer terminals, which is not the same thing as scoring a line >>> around a set >>> > of 64 graphic images. >>> >>> No, that is why I put in variation selectors. The glyphic variation >>> selected would in my judgement fall well within the "box drawing >>> semantics" >>> (if you like) of these characters. >> >> >> Some Asian chess boards include also diagonal lines or dots on top of >> their crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups >> by such dots). These chess boards do not alternate white and black >> "squares" ; beside this, the cells may also be rectangular (longer >> vertically than horizontally) however such metric is not so important, as >> long as all cells have coherent sizes and can fit the pieces (which are >> flat like domino tiles, but not laid vertically on top of the table, and >> where pieces use symbols or sinograms instead of 3D head sculptures). >> > > Shogi and Xiangqi diagrams are very different from Western chess diagrams, > and necessarily outside of the scope of this proposal. There is no unified > solution because the problems to be solved are different. > They are not out of scope, given that both games have wellknown western variants using pieces like in chess. And chess is also played in with Shogi or Xiangqi boards and pieces. "Human pieces" are also used like in chess with people playing characters on the ground. The form of Xiangqi and Shogi pieces is also not necessarily using cursive serif sinograms; most plyaers now use simplified non-serif sinograms, but there are also westernized versions using letters or symbols. The traditional narrow pentagon form of pieces (used with traditional board using rectangular cells) is frequently replaced by squares, octagons or circle pieces (with symbols centered on the sinographic composition squares). There are also triangular pieces (played on board variants with a triangular grid, and with different moving rules) But what is important is that the pieces must only have a coherent size. Then comes the insertion of the grid, necessary for rentering a full board: you cannot use only the pieces., you need to also represent at least the empty cells. borders of boards are optional, more related to presentation (just like the use of colors for board cells and pieces, or styles of forms for pieces and symbols). > > Shogi diagrams are uncheckered (as Shogi boards are), with grid-lines to > separate the spaces; traditionally, chess diagrams use the contrast of dark > and light squares to distinguish spaces with no grid lines. They may, but > do not have to, have dots at some intersections (these mark starting and > promotion zones). Graphical diagrams may show images of pieces (pentagonal, > with names written in kanji), but typeset diagrams use abbreviations of the > piece names as CJK ideographs or kana: e.g. the gold general is ?, and the > promoted pawn is ?. Instead of "black" and "white", the pieces belonging to > the sente player are displayed upright and those belonging to the gote > player are rotated 180?. Any proposal for Shogi would have to deal with > that. > > Xiangqi diagrams are even less like chess diagrams, since Xiangqi is > played on the intersections of the board grid, not the spaces. This (and > the closely related Korean game of Janggi) is the one you're thinking of > with diagonals. Pieces are represented by CJK ideographs in circles > (sometimes octagons in the case of Janggi). > > The primary issue for typesetting Western chess diagrams is dark squares > with and without piece symbols. This issue is irrelevant to East Asian > chesslike games. > This is the same issue: there's the same ned to represent at least the empty cells, and being able to count them, even if you don't need to represent separation of cells (but there may still be a need for gaps where you'll also find non-interecting horizontal, vertical or diagonal segments, or space, with coherent width and heights that match the width and height of cells they are bordering where you'll center the pieces). This is the same problem. The same problem as crossword grids where we need also empty cells (and "black" cells which are equivalent to an empty cell with a black square symbol instead of letters). > > I've not seen any box drawing character with such dots or all the needed >> diagonals. >> > > China included some extensions to the Box Drawing characters in an early > proposal for Xiangqi characters L2010/10368-n3910.pdf>. AIUI that was shot down < > http://www.unicode.org/L2/L2010/10463-chinese-chess.pdf>. China seems to > have dropped the matter. Later Xiangqi proposals by Andrew West focused on > the circled ideographs and did not pursue new diagram drawing characters, > and were eventually successful. > This does not solve the need of grids with black dots (or equivalent star-like decoration) on some intersecting lines. box drawing characters don't have such variants and still lack support for diagonals. This is also needed for various classic games using digits. non-rectangular grids are also needed : tringaular and hexagonal grids are common in various games (for them you'll need "'half" cells to fit the alternating rows on the left and right of diagrams). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 12 01:35:36 2017 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Wed, 12 Apr 2017 15:35:36 +0900 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: On 2017/04/12 00:44, Philippe Verdy via Unicode wrote: > Some Asian chess boards include also diagonal lines or dots on top of their > crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups by such > dots). These chess boards do not alternate white and black "squares" ; > beside this, the cells may also be rectangular (longer vertically than > horizontally) [mostly OT] On Go boards, the grid cells are definitely rectangular, not square. The reason for this is that boards are usually looked at at an angle, and having the cells be higher than wide makes them appear (close to) square. However, because diagrams are usually viewed at close to a right angle, Go diagrams use squares, not rectangles. Regards, Martin. From unicode at unicode.org Wed Apr 12 02:59:26 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Wed, 12 Apr 2017 08:59:26 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: <20170412085926.682347fe@JRWUBU2> On Wed, 12 Apr 2017 06:58:22 +0200 Philippe Verdy via Unicode wrote: > This is the same problem. The same problem as crossword grids where > we need also empty cells (and "black" cells which are equivalent to > an empty cell with a black square symbol instead of letters). And black cells with notches to accommodate the combining marks etc. above and below from the white cells below and above. I've seen that in Thai crosswords. Now, what character would we bend to accommodate these notches? I suspect this might be taking the encoding of picture elements a bit too far. What about circuit diagrams? Richard. From unicode at unicode.org Wed Apr 12 04:13:37 2017 From: unicode at unicode.org (Andrew West via Unicode) Date: Wed, 12 Apr 2017 10:13:37 +0100 Subject: Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation) Message-ID: On 12 April 2017 at 05:12, Garth Wallace via Unicode wrote: > > Later Xiangqi proposals by Andrew West focused on > the circled ideographs and did not pursue new diagram drawing characters, > and were eventually successful. My Xiangqi proposal (http://www.unicode.org/L2/L2016/16255-n4748-xiangqi.pdf) proposed a minimal set of logical game pieces for Xiangqi/Janggi, regardless of shape (circular or octagonal) or design (traditional characters, simplified characters, cursive characters, or pictures) which I consider a font design issue, and explicitly did not seek to encode circled ideographs. My proposal was rejected, and a different proposal by Michael Everson (http://www.unicode.org/L2/L2016/16270-n4766-xiangqi.pdf) to encode all circled ideographs and negative circled ideographs attested in Xiangqi game diagrams was accepted instead. The accepted proposal for circled ideographs is a glyph encoding model not a character encoding model as for other game symbols (Chess, Dominos, Mahjong, Playing Cards, etc.), and in my opinion it is a very bad model for several reasons. It makes the interchange of Xiangqi game data and game diagrams problematic; it hinders normal text processing operations on Xiangqi game pieces (for example, to search for a red horse piece you have to search for three different characters); and in modern computer usage Xiangqi game pieces may not be represented as simple circled ideographs, but may be coloured designs showing characters or images. It is also very likely that vendors will want to produce emoji versions of Xiangqi pieces, and these could not reasonably be considered to be glyph variants of circled ideographs. There has been some negative feedback on the circled ideographs model on the internet, and I believe that Michael has now been convinced that this model is wrong, and should be replaced by a model using logical game pieces. Andrew From unicode at unicode.org Wed Apr 12 04:15:46 2017 From: unicode at unicode.org (Kent Karlsson via Unicode) Date: Wed, 12 Apr 2017 11:15:46 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-12 05:14, skrev "Garth Wallace" : > One salient feature the Block Elements have that the Box Drawing characters do > not: distinct LEFT and RIGHT verticals, and LOWER and UPPER horizontals. The > double frame typically consists of a thin line and a thicker line, with one on > the inside and one on the outside, so left and right verticals are not > interchangeable. Even when a single frame is used, it is important for > spacing, since the frame should be flush against the board. Note that I used TWO DIFFERENT variation selectors for the horizontal and vertical box drawing characters in my suggestion (marked in bold here): 2500 FE00; Chessboard box drawing (top); # BOX DRAWINGS LIGHT HORIZONTAL (U+2500) 2500 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS LIGHT HORIZONTAL (U+2500) 2502 FE00; Chessboard box drawing (left); # BOX DRAWINGS LIGHT VERTICAL (U+2502) 2502 FE01; Chessboard box drawing (right); # BOX DRAWINGS LIGHT VERTICAL (U+2502) 250C FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND RIGHT (U+250C) 2510 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND LEFT (U+2510) 2514 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND RIGHT (U+2514) 2518 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND LEFT (U+2518) 2550 FE00; Chessboard box drawing (top); # BOX DRAWINGS DOUBLE HORIZONTAL (U+2550) 2550 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS DOUBLE HORIZONTAL (U+2550) 2551 FE00; Chessboard box drawing (left); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) 2551 FE01; Chessboard box drawing (right); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) 2554 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND RIGHT (U+2554) 2557 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND LEFT (U+2557) 255A FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND RIGHT (U+255A) 255D FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND LEFT (U+255D) /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 12 04:16:10 2017 From: unicode at unicode.org (Kent Karlsson via Unicode) Date: Wed, 12 Apr 2017 11:16:10 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: Message-ID: Den 2017-04-12 06:12, skrev "Garth Wallace" : > Shogi diagrams are uncheckered (as Shogi boards are), with grid-lines to > separate the spaces; traditionally, chess diagrams use the contrast of dark > and light squares to distinguish spaces with no grid lines. They may, but do > not have to, have dots at some intersections (these mark starting and > promotion zones). Graphical diagrams may show images of pieces (pentagonal, > with names written in kanji), but typeset diagrams use abbreviations of the > piece names as CJK ideographs or kana: e.g. the gold general is ?, and the > promoted pawn is ?. Instead of "black" and "white", the pieces belonging to > the sente player are displayed upright and those belonging to the gote player > are rotated 180?. Any proposal for Shogi would have to deal with that. OT Unicode has (only) these for Shogi pieces: 2616;WHITE SHOGI PIECE;So;0;ON;;;;;N;;;;; 2617;BLACK SHOGI PIECE;So;0;ON;;;;;N;;;;; 26C9;TURNED WHITE SHOGI PIECE;So;0;ON;;;;;N;;;;; 26CA;TURNED BLACK SHOGI PIECE;So;0;ON;;;;;N;;;;; Which seems insufficient... /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 12 06:54:58 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Wed, 12 Apr 2017 12:54:58 +0100 Subject: Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation) In-Reply-To: References: Message-ID: <6255B97D-1147-4BA3-B280-0AD45707B76E@evertype.com> On 12 Apr 2017, at 10:13, Andrew West via Unicode wrote: > My Xiangqi proposal (http://www.unicode.org/L2/L2016/16255-n4748-xiangqi.pdf) proposed a minimal set of logical game pieces for Xiangqi/Janggi, regardless of shape (circular or octagonal) or design (traditional characters, simplified characters, cursive characters, or pictures) which I consider a font design issue, and explicitly did not seek to encode circled ideographs. My proposal was rejected, and a different proposal by Michael Everson (http://www.unicode.org/L2/L2016/16270-n4766-xiangqi.pdf) to encode all circled ideographs and negative circled ideographs attested in Xiangqi game diagrams was accepted instead. Not quite. At the WG2 meeting it was proposed, I believe by experts from the US, to use circled ideographs to represent xiangqi characters. ?In for a penny, in for a pound,? I thought, and so said that if we were to do that we?d have to encoded all the attested circled ideographs, because you can?t have a circled ? (58EB) and say that a circled ? (4ED5) is a valid glyph variant of it. Then I wrote that proposal so that we could have an actionable document with which to get characters on the ballot. > The accepted proposal for circled ideographs is a glyph encoding model not a character encoding model as for other game symbols (Chess, > Dominos, Mahjong, Playing Cards, etc.), This is true. > and in my opinion it is a very bad model for several reasons. It makes the interchange of Xiangqi game data and game diagrams problematic; it hinders normal text processing operations on Xiangqi game pieces (for example, to search for a red horse piece you have to search for three different characters); Yes, it does. It is important to remember that this use of symbols is a text usage. > and in modern computer usage Xiangqi game pieces may not be represented as simple circled ideographs, but may be coloured designs showing characters or images. Or black and white designs showing for instance an actual elephant rather than ? 8C61. > It is also very likely that vendors will want to produce emoji versions of Xiangqi pieces, ?? > and these could not reasonably be considered to be glyph variants of circled ideographs. True. > There has been some negative feedback on the circled ideographs model on the internet, and I believe that Michael has now been convinced that this model is wrong, and should be replaced by a model using logical game pieces. I was convinced, and my proposal to rectify this were provided as Irish ballot comments to PDAM 1.2. Michael Everson From unicode at unicode.org Wed Apr 12 06:55:52 2017 From: unicode at unicode.org (Michael Everson via Unicode) Date: Wed, 12 Apr 2017 12:55:52 +0100 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: <1144B3AE-1EDF-4E2F-9FD6-26A786692BD8@evertype.com> On 12 Apr 2017, at 10:16, Kent Karlsson via Unicode wrote: > Unicode has (only) these for Shogi pieces: > > 2616;WHITE SHOGI PIECE;So;0;ON;;;;;N;;;;; > 2617;BLACK SHOGI PIECE;So;0;ON;;;;;N;;;;; > 26C9;TURNED WHITE SHOGI PIECE;So;0;ON;;;;;N;;;;; > 26CA;TURNED BLACK SHOGI PIECE;So;0;ON;;;;;N;;;;; > > Which seems insufficient? Yes, we know. One thing at a time, please. Michael Everson From unicode at unicode.org Wed Apr 12 07:45:06 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 12 Apr 2017 14:45:06 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: 2017-04-12 8:35 GMT+02:00 Martin J. D?rst : > On 2017/04/12 00:44, Philippe Verdy via Unicode wrote: > > Some Asian chess boards include also diagonal lines or dots on top of their >> crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups by >> such >> dots). These chess boards do not alternate white and black "squares" ; >> beside this, the cells may also be rectangular (longer vertically than >> horizontally) >> > > [mostly OT] > > On Go boards, the grid cells are definitely rectangular, not square. The > reason for this is that boards are usually looked at at an angle, and > having the cells be higher than wide makes them appear (close to) square. > However, because diagrams are usually viewed at close to a right angle, Go > diagrams use squares, not rectangles. That's not a valid reason. "Go" uses **square** cells not **rectangles*** because of the form of the pieces (round) and the fact they must nearly touch each other to surround other pieces. But in Asian chesses the forms of grid cells is adapted to the form of pieces: the narrow pentagons are traditional, but not required. Many Japanese people playing it on a computer screen see non narrow "pieces" that are in fact just symbols in squares, and the cells will be square. The board is also not observed horizontally by players. The width/height ratio of grid cells adapt to the ratio of pieces and the avialable surface. "Human" chess players are playing on the ground with square cells (tiles on the floor or strokes drawn). And these Asian chess have wellknown variants with triangular cells: these use square or round pieces played in equilateral triangle cells. The narrow pentagonal pieces are not used. But the symbols on pieces are the same (the reules of game are different but similar in spirit). And nothing prevents plyaing western chesses on boards like traditional Asian ones with flat pieces that are not necessarily in a square. Rectangular boards with narrow pieces are used only on large boards where players are sitting on both sides, these boards and sets of tiles are expensive. Smaller versions to play on a table use square boards like in many games, with cheeaper boards and pieces sets. The games rules remain the same, the form will adapt and because we are speaking of Unicode encoding, the aspect ratio is not relevant: we are encoding characters, not glyphs with specific aspect ratios. And if the pieces are represented only by symbols, or circles or squared will not change the game: they are just pieces. As well the glyphs for pieces can use cursive or simplified sinograms, this does not matter: today most palyers use pieces with simplified non cursive strokes. The traditional pieces are in museums or used in specific gaming circles, they are pieces of art. What I mean is that we are talking about a proposal that is futile: it does not cover the real needs. What we currently have in Unicode is a set of characters for pieces, but the proposal attempts to mix concepts to represent something else that pieces, i.e. some boards. I deeply don't approve the proposal (based on variantsfor pieces) and would prefer a more general model applicable to many other games (including crosswords): A set of characters to represent board cells independantly of pieces played in them. Then use the existing characters (letters/digits/symbols/emojis...) with them. The encoding should just be: when there's a piece inseide or just for the absence of piece. An empty board will just rows of and no piece. For western chesses we just need two base cells: white cell and black cell. For Asian chesses only one is needed (size ratio does not matter), the white cell. A Then comes the encoding of grids: these could be variants of cells, square/rectangle being the default, two variants would be used for triangular cells and hexagonal cells, but we would need left-half and right half cells for them. The encoding would be . The triangular variants in fonts would have left-side and right-side bearings that would be removed and would become netative via standard kernings when they are envoded side by side (just like there's kerning pairs in "AVAVAVA"). The hexagonal variants would extend abit above and below the standard line-height (vertical kerning would occur between rows) For go we need cells variants with border lines passing through their center: they are like the existing box-drawing characters (single stroke versions), except there's a lack of a cell with central dot. These should be . the pieces are existing white and black "stones" (which may also exist with their own Emoji like variant). We can also encode With such system we can represent many, many more games. It is much more general, it will work also for draughts, crosswords, "scrabble", "boggle", and really many, many games. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 12 08:48:37 2017 From: unicode at unicode.org (Julian Bradfield via Unicode) Date: Wed, 12 Apr 2017 14:48:37 +0100 Subject: Proposal to add standardized variation sequences for chess notation References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: On 2017-04-12, Philippe Verdy via Unicode wrote: > 2017-04-12 8:35 GMT+02:00 Martin J. D?rst : >> On Go boards, the grid cells are definitely rectangular, not square. The >> reason for this is that boards are usually looked at at an angle, and >> having the cells be higher than wide makes them appear (close to) square. >> However, because diagrams are usually viewed at close to a right angle, Go >> diagrams use squares, not rectangles. > > That's not a valid reason. "Go" uses **square** cells not **rectangles*** > because of the form of the pieces (round) and the fact they must nearly > touch each other to surround other pieces. I don't think Go players and board makers have any interest in your views of valid reasons. According to the information provided by various national Go societies, the typical Japanese Go cell is 22mm by 23.6mm, for the reason Martin stated. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From unicode at unicode.org Wed Apr 12 09:58:36 2017 From: unicode at unicode.org (Garth Wallace via Unicode) Date: Wed, 12 Apr 2017 14:58:36 +0000 Subject: Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation) In-Reply-To: References: Message-ID: On Wed, Apr 12, 2017 at 2:13 AM Andrew West wrote: > On 12 April 2017 at 05:12, Garth Wallace via Unicode > wrote: > > > > Later Xiangqi proposals by Andrew West focused on > > the circled ideographs and did not pursue new diagram drawing characters, > > and were eventually successful. > > My Xiangqi proposal > (http://www.unicode.org/L2/L2016/16255-n4748-xiangqi.pdf) proposed a > minimal set of logical game pieces for Xiangqi/Janggi, regardless of > shape (circular or octagonal) or design (traditional characters, > simplified characters, cursive characters, or pictures) which I > consider a font design issue, and explicitly did not seek to encode > circled ideographs. My proposal was rejected, and a different proposal > by Michael Everson > (http://www.unicode.org/L2/L2016/16270-n4766-xiangqi.pdf) to encode > all circled ideographs and negative circled ideographs attested in > Xiangqi game diagrams was accepted instead. > Ah, I misremembered, sorry. > > The accepted proposal for circled ideographs is a glyph encoding model > not a character encoding model as for other game symbols (Chess, > Dominos, Mahjong, Playing Cards, etc.), and in my opinion it is a very > bad model for several reasons. It makes the interchange of Xiangqi > game data and game diagrams problematic; it hinders normal text > processing operations on Xiangqi game pieces (for example, to search > for a red horse piece you have to search for three different > characters); and in modern computer usage Xiangqi game pieces may not > be represented as simple circled ideographs, but may be coloured > designs showing characters or images. It is also very likely that > vendors will want to produce emoji versions of Xiangqi pieces, and > these could not reasonably be considered to be glyph variants of > circled ideographs. There has been some negative feedback on the > circled ideographs model on the internet, and I believe that Michael > has now been convinced that this model is wrong, and should be > replaced by a model using logical game pieces. > > Andrew So has that proposal been retracted now? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 12 10:10:27 2017 From: unicode at unicode.org (Andrew West via Unicode) Date: Wed, 12 Apr 2017 16:10:27 +0100 Subject: Xiangqi Game Symbols (was Re: Proposal to add standardized variation sequences for chess notation) In-Reply-To: References: Message-ID: On 12 April 2017 at 15:58, Garth Wallace wrote: > > So has that proposal been retracted now? Once a proposal has been approved it cannot simply be retracted by the submitter. On the SC2 side, the proposed characters have been subject to ballot comments from national bodies, and no doubt they will be discussed at the WG2 meeting in Hohhot later this year. Andrew From unicode at unicode.org Wed Apr 12 11:27:31 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 12 Apr 2017 18:27:31 +0200 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: <7249D55E-B774-4079-A212-0E72D81C4966@evertype.com> Message-ID: 2017-04-12 15:48 GMT+02:00 Julian Bradfield via Unicode : > On 2017-04-12, Philippe Verdy via Unicode wrote: > > 2017-04-12 8:35 GMT+02:00 Martin J. D?rst : > >> On Go boards, the grid cells are definitely rectangular, not square. The > >> reason for this is that boards are usually looked at at an angle, and > >> having the cells be higher than wide makes them appear (close to) > square. > >> However, because diagrams are usually viewed at close to a right angle, > Go > >> diagrams use squares, not rectangles. > > > > That's not a valid reason. "Go" uses **square** cells not > **rectangles*** > > because of the form of the pieces (round) and the fact they must nearly > > touch each other to surround other pieces. > > I don't think Go players and board makers have any interest in your > views of valid reasons. > According to the information provided by various national Go > societies, the typical Japanese Go cell is 22mm by 23.6mm, for the > reason Martin stated. > This is nearly square, and optically square for boards played on table (really a very minor detail); on a screen or on printed diagrams they are obviously squares. We were talking about Xanqi and similar traditional boards that are really rectangular and played with pieces that are really narrow (but here again this does not apply to playing on screen, or larger "boards" drawn on the floor by human actors instead of pieces. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 12 20:56:14 2017 From: unicode at unicode.org (Garth Wallace via Unicode) Date: Wed, 12 Apr 2017 18:56:14 -0700 Subject: Proposal to add standardized variation sequences for chess notation In-Reply-To: References: Message-ID: On Wed, Apr 12, 2017 at 2:15 AM, Kent Karlsson wrote: > > Den 2017-04-12 05:14, skrev "Garth Wallace" : > > One salient feature the Block Elements have that the Box Drawing > characters do not: distinct LEFT and RIGHT verticals, and LOWER and UPPER > horizontals. The double frame typically consists of a thin line and a > thicker line, with one on the inside and one on the outside, so left and > right verticals are not interchangeable. Even when a single frame is used, > it is important for spacing, since the frame should be flush against the > board. > > > Note that I used TWO DIFFERENT variation selectors for the horizontal and > vertical box drawing characters in my suggestion (marked in bold here): > > > > > > *2500 FE00; Chessboard box drawing (top); # BOX DRAWINGS LIGHT HORIZONTAL > (U+2500) 2500 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS LIGHT > HORIZONTAL (U+2500) 2502 FE00; Chessboard box drawing (left); # BOX > DRAWINGS LIGHT VERTICAL (U+2502) 2502 FE01; Chessboard box drawing (right); > # BOX DRAWINGS LIGHT VERTICAL (U+2502) *250C FE00; Chessboard box > drawing; # BOX DRAWINGS LIGHT DOWN AND RIGHT (U+250C) > 2510 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT DOWN AND LEFT > (U+2510) > 2514 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND RIGHT > (U+2514) > 2518 FE00; Chessboard box drawing; # BOX DRAWINGS LIGHT UP AND LEFT > (U+2518) > > > > > > *2550 FE00; Chessboard box drawing (top); # BOX DRAWINGS DOUBLE HORIZONTAL > (U+2550) 2550 FE01; Chessboard box drawing (bottom); # BOX DRAWINGS DOUBLE > HORIZONTAL (U+2550) 2551 FE00; Chessboard box drawing (left); # BOX > DRAWINGS DOUBLE VERTICAL (U+2551) 2551 FE01; Chessboard box drawing > (right); # BOX DRAWINGS DOUBLE VERTICAL (U+2551) *2554 FE00; Chessboard > box drawing; # BOX DRAWINGS DOUBLE DOWN AND RIGHT (U+2554) > 2557 FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE DOWN AND LEFT > (U+2557) > 255A FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND RIGHT > (U+255A) > 255D FE00; Chessboard box drawing; # BOX DRAWINGS DOUBLE UP AND LEFT > (U+255D) > > /Kent K > Ah, I missed that in the earlier message, sorry. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Apr 14 17:25:17 2017 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Sat, 15 Apr 2017 00:25:17 +0200 (CEST) Subject: Superscript and Subscript Characters in General Use / Re: French Superscript Abbreviations Fit Plain Text Requirements Message-ID: <591545298.19786.1492208717277.JavaMail.www@wwinf1p12> On Mon, 23 Jan 2017 10:30:17 +0100 (CET), I wrote: > [?] I now believe and will > spread the word that [?] on the other > hand, the recommendations in TUS may be considered a mere official discourse for > encoding process management purposes, but with little through no real impact on > actual practice. http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0119.html That had little of a real intent, and much of a formatted discourse?thus a kind of not fully accurate uttering?intended to make aware that untrue statements made on behalf of some entity worsen the image of that entity and lessen the overall reliability of related products. In this specific context, I believed that an overstating shortcut could have the most impact. It was foreseeable that the weapons used by the UTC would be returned against themselves at some point. Having said that, I still always try to overcome and to make the best of the existing, e.g. by providing a facility for input of some most useful Latin superscript small letters along with the other hard-to-access characters of the target locale. Regards, Marcel From unicode at unicode.org Sat Apr 15 16:33:21 2017 From: unicode at unicode.org (Marcel Schneider via Unicode) Date: Sat, 15 Apr 2017 23:33:21 +0200 (CEST) Subject: Public review of draft repertoire for ISO/IEC 10646 Message-ID: <344050077.12931.1492292001280.JavaMail.www@wwinf1c07> On Sat, 16 Jul 2016 06:14:45 +0200 (CEST), I wrote(1): > I note that now that the Unicode repertoire is built at cruise speed, > few to no feedback items are reported.[1][2] > [1] http://www.unicode.org/review/pri327/feedback.html > [2] http://www.unicode.org/review/pri328/feedback.html It seems to happen that while many characters provided in Amendment 1 (PDAM) to ISO/IEC 10646:2016 (5th edition) didn?t make it into Unicode so far, many others that are now in beta didn?t show up in draft additional repertoire for ISO/IEC 10646:2016 (5th edition) DIS, neither. That is enough of an explanation why to date, the UTC is facing a number of feedback items requesting name changes and even code point swapping,(2) that otherwise is considered not actionable.(3) I?d suggest to provide for each character enough descriptors?if not in the name, so at least in the aliases?so as to avoid any ambiguity.(4) (At my level, I?m implementing almost all feedback on PRI#350 when localizing the data for human reading. Including name changes.) Regards, Marcel (1) http://www.unicode.org/mail-arch/unicode-ml/y2016-m07/0002.html (2) http://www.unicode.org/review/pri350/ (3) http://www.unicode.org/review/index.html#feedback (4) http://www.unicode.org/mail-arch/unicode-ml/y2017-m03/0087.html From unicode at unicode.org Mon Apr 17 10:23:51 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Mon, 17 Apr 2017 16:23:51 +0100 Subject: Rationale for IPC of Newa Dependent Vowels Message-ID: <20170417162351.3c12fdc0@JRWUBU2> I have doubts about the Indic_Positional_Category (InPC) values proposed for four new dependent vowels being added in Unicode 10.0.0. On examining the vowel chart (p1265 of http://www.unicode.org/Public/10.0.0/charts/CodeCharts.pdf) one may feel quite comfortable with assigning the property values: 1143E..1143F ; Top # Mn [2] NEWA VOWEL SIGN E..NEWA VOWEL SIGN AI 11440..11441 ; Right # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU However, on consulting Section 3.6 of Anshuman Pandey's 'Proposal to Encode the Newar Script in ISO/IEC 10646' http://www.unicode.org/L2/L2012/12003r-newar.pdf, one finds that after the seven headless consonants GA, NYA, TTHA, NNA, THA, DHA and SHA, the dependent vowels take forms more appropriate to the property values 1143E ; Left # Mn NEWA VOWEL SIGN E 1143F ; Top_and_Left # Mn NEWA VOWEL SIGN AI 11440..11441 ; Left_and_Right # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU Now, I have no idea what the effect of a right-to-left directional override should be on the combining marks, but in general I believe gc=Mn makes more sense for U+1143E and U+1143F, so I am not challenging that property assignment. However, I do wonder what the best property values are for a renderer, such as Microsoft's Universal Shaping Engine. It seems to me that it may be better to start with the properties involving 'Left' and use contextual substitutions to convert the dependent vowels to components of the correct position. However, this does seem more complicated than the general decomposition of multipart vowels. In particular, for headed consonants, a glyph substitution is required to replace the head by a part of the vowel symbol; the default glyph will not be appropriate. It is entirely possible that a font will simply replace a headed consonant and any of these four vowels by a ligature glyph, leaving reordering to be considered only for the seven headless consonants. Has this matter been considered? If so, is the rationale recorded anywhere? Richard. From unicode at unicode.org Tue Apr 18 15:54:15 2017 From: unicode at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via Unicode) Date: Tue, 18 Apr 2017 22:54:15 +0200 (CEST) Subject: Extended_Pictographic in Unicode Utilities Message-ID: <1033062992.81914.1492548855994.JavaMail.open-xchange@app07.ox.hosteurope.de> Maybe I just need to be a patient a bit longer, but maybe I'm missing something: Since CLDR and ICU have been updated recently and the Extended_Pictographic codepoint property has been added, shouldn't it be accessible within Unicode Utilities: UnicodeSet? From unicode at unicode.org Wed Apr 19 18:35:39 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 20 Apr 2017 00:35:39 +0100 Subject: Counting Devanagari Aksharas Message-ID: <20170420003539.205acdb4@JRWUBU2> Is there consensus on how to count aksharas in the Devanagari script? The doubts I have relate to a visible halant in orthographic syllables other than the first. For example, according to 'Devanagari VIP Team Issues Report' http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a derived form from Nepali ???????? should be written ??????????? and not ?????????? . Now, if the font used has a conjunct for SHRA, I would count the former as having 4 aksharas SH.RII, MAA, N, KO and the latter as having 3 aksharas SH.RII, MAA, N.KO. If the font leads to the use of a visible halant instead of the vattu conjunct SH.RA, as happens when I view this email, would there then be 5 and 4 aksharas respectively? A further complication is that the font chosen treats what looks like SH, RA as a conjunct; the vowel I appears to the left of SH when added after RA (????). Richard. From unicode at unicode.org Thu Apr 20 02:49:49 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 20 Apr 2017 08:49:49 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> Message-ID: <20170420084949.25d47dd9@JRWUBU2> I was offered the following reply: > To my knowledge except in Tamil script vowel less consonants in > written form aren't considered as separate "akshara"s in native > terminology. Word-finally they seem to be being treated as such. To be more precise, a final cluster of one or more consonants marked as having no vowel is - Sanskrit has a few word-final clusters. > However for text shaping purposes they will surely have > to be considered as separate orthographic syllables in Unicode > terminology since in word end position they can sometimes carry svara > markers. The complication comes word internally. My understanding is that phonetically syllable-final consonants in non-Indic words in non-Indic languages have a tendency not to be included in an akshara along with the start of the next syllable. However, that tendency is more evident in scripts other than Devanagari; Devanagari has developed in the context of Indic languages. Renderers' syllable-recognition algorithms will naturally treat word-final devowelled sequences as separate units, rather than associate them with the previous implicit or explict vowel. Burmese is a good example of what can happen with a non-Indic language; in native words, phonetic syllabic boundaries tend to be orthographic syllable boundaries. Text-shaping engines like Microsoft's Uniscribe are more complicated. For scripts with a virama, they seem to assume that the virama may be a combining operator, and wait for data from the font to decide how many clusters to form. One test is the insertion of white spaces in a word when it is stretched out. Of course, that test can only be applied where human decisions are involved - otherwise we are just looking at what dominant renderers are actually doing, rather than looking at what they ought to be doing. Richard. From unicode at unicode.org Thu Apr 20 05:03:37 2017 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Thu, 20 Apr 2017 15:33:37 +0530 Subject: Counting Devanagari Aksharas In-Reply-To: <20170420084949.25d47dd9@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170420084949.25d47dd9@JRWUBU2> Message-ID: Hello Richard. Yes my earlier reply wasn't intended to be offlist. I have near-zero knowledge about non-Indic languages. All I can say is that Tamil script has eschewed most consonant cluster ligatures/conjoining forms. As for Devanagari, writing ??????????? (I used ZWNJ) i.o. ?????????? is quite possible with existing technology. The latter would be Sanskrit orthography and former perhaps Hindi, although I wouldn't know why anyone would want to run in the ?? with the preceding ???????? even in Hindi. And IMO it would be better to clearly define at the outset what you meant by "akshara" in your question to avoid confusions by people replying having a different idea of the meaning of that term. -- Shriramana Sharma ???????????? ???????????? From unicode at unicode.org Thu Apr 20 13:17:05 2017 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Thu, 20 Apr 2017 11:17:05 -0700 Subject: Counting Devanagari Aksharas In-Reply-To: <20170420003539.205acdb4@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> Message-ID: I don't think there's consensus. When given a rendered representation people seem to uniformly count conjuncts as multiple aksharas if rendered with visible halant, and as a single akshara if they are rendered conjoined. Most fonts for devanagari these days are pretty good at conjoining consonants. They seem to do so for all common conjuncts, and usually for most practical (i.e. not ridiculously long) conjuncts. I've never seen a visible halant in text I've read. I'm of the opinion that Unicode should start considering devanagari (and possibly other indic) consonant clusters as single extended grapheme clusters. Yes, sometimes it's not rendered as a single glyph, but sometimes family emoji will not render as a single glyph either (if you use skin tones or more than 4 family members) and we still consider those EGCs. -Manish On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode wrote: > Is there consensus on how to count aksharas in the Devanagari script? > The doubts I have relate to a visible halant in orthographic syllables > other than the first. > > For example, according to 'Devanagari VIP Team Issues Report' > http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a > derived form from Nepali ???????? should be written ??????????? > DEVANAGARI LETTER RA, U+0940 DEVANAGARI VOWEL SIGN II, U+092E > DEVANAGARI LETTER MA, U+093E DEVANAGARI VOWEL SIGN AA, U+0928 > DEVANAGARI LETTER NA, U+094D, U+200C ZERO WIDTH NON-JOINER, U+0915 > DEVANAGARI LETTER KA, U+094B DEVANAGARI VOWEL SIGN O> and not > ?????????? U+094D, U+0915, U+094B>. Now, if the font used has a conjunct for > SHRA, I would count the former as having 4 aksharas SH.RII, MAA, N, KO > and the latter as having 3 aksharas SH.RII, MAA, N.KO. > > If the font leads to the use of a visible halant instead of the vattu > conjunct SH.RA, as happens when I view this email, would there then be > 5 and 4 aksharas respectively? A further complication is that the font > chosen treats what looks like SH, RA as a conjunct; the vowel I appears > to the left of SH when added after RA (????). > > Richard. > From unicode at unicode.org Thu Apr 20 13:30:25 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 20 Apr 2017 19:30:25 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> <20170420084949.25d47dd9@JRWUBU2> Message-ID: <20170420193025.5087bf3a@JRWUBU2> On Thu, 20 Apr 2017 15:33:37 +0530 Shriramana Sharma via Unicode wrote: > All I can say is that Tamil script has eschewed most consonant cluster > ligatures/conjoining forms. As for Devanagari, writing ??????????? (I > used ZWNJ) i.o. ?????????? is quite possible with existing technology. > The latter would be Sanskrit orthography and former perhaps Hindi, > although I wouldn't know why anyone would want to run in the ?? with > the preceding ???????? even in Hindi. According to p23 of http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, it's Nepali. It's a compromise between ?????????? and Hindi-style ???????? ??. > And IMO it would be better to > clearly define at the outset what you meant by "akshara" in your > question to avoid confusions by people replying having a different > idea of the meaning of that term. I didn't want to be any more precise than "orthographic syllable". Swaran Lata is urging, in submission http://www.unicode.org/L2/L2017/17094-indic-text-seg.pdf to the UTC, that UAX#29 "Unicode Text Segmentation" adopt a rather na?ve definition of an Indian orthographic syllable. The worst outcome in my opinion would be if it were adopted for the extended grapheme cluster definition - it would make editing orthographic clusters even more difficult. However, it would make sense for CLDR to carry localised definitions. For layout, the definition would be relevant for 'drop capital effects' and for the analogue of inserting spaces between letters. There are recommendations in a maturing W3C specification for Indic layout, though to be fair the specification fairly quickly restricts its scope to Indian scripts. Now, if the spacing were applied to the Nepali word ??????????? I would expect to see something like ???? ?? ?? ??, as the base word itself would appear as ???? ?? ?? when subjected to the same treatment. However, before suggesting minor improvements that might be in order, I thought I should check whether there was agreement that terminated an orthographic syllable. It now seems that any general agreement would in fact be that it did *not* terminate an orthographic syllable! I must say that stretching ??????????? out as ???? ?? ????? feels wrong. If my feeling is right, then the definition of orthographic syllable, if it can be done without reference to a font, belongs in CLDR, as UAX#29 implies, and not in the Unicode Character Database and Unicode standards. Richard. From unicode at unicode.org Thu Apr 20 14:14:10 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 20 Apr 2017 20:14:10 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> Message-ID: <20170420201410.70377691@JRWUBU2> On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > When given a rendered representation people seem to uniformly count > conjuncts as multiple aksharas if rendered with visible halant, and as > a single akshara if they are rendered conjoined. Now, that's what I expected. > I'm of the opinion that Unicode should start considering devanagari > (and possibly other indic) consonant clusters as single extended > grapheme clusters. Yes, sometimes it's not rendered as a single glyph, > but sometimes family emoji will not render as a single glyph either > (if you use skin tones or more than 4 family members) and we still > consider those EGCs. You won't like it if cursor movement granularity is reduced to one extended grapheme cluster. I'm grateful that Emacs allows me to delete and replace the first NFC character of a grapheme cluster. Richard. From unicode at unicode.org Thu Apr 20 16:14:00 2017 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Thu, 20 Apr 2017 14:14:00 -0700 Subject: Counting Devanagari Aksharas In-Reply-To: <20170420201410.70377691@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170420201410.70377691@JRWUBU2> Message-ID: I mean, we do the same for Hangul. The main time you need intra-conjunct segmentation in Devanagari is when deleting something you just typed. And backspace usually operates on code points anyway (except for some weird cases like flag emoji, though this isn't uniform across platforms). I don't see how intra-conjunct selection would be useful otherwise. -Manish On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode wrote: > On Thu, 20 Apr 2017 11:17:05 -0700 > Manish Goregaokar via Unicode wrote: > >> When given a rendered representation people seem to uniformly count >> conjuncts as multiple aksharas if rendered with visible halant, and as >> a single akshara if they are rendered conjoined. > > Now, that's what I expected. > >> I'm of the opinion that Unicode should start considering devanagari >> (and possibly other indic) consonant clusters as single extended >> grapheme clusters. Yes, sometimes it's not rendered as a single glyph, >> but sometimes family emoji will not render as a single glyph either >> (if you use skin tones or more than 4 family members) and we still >> consider those EGCs. > > You won't like it if cursor movement granularity is reduced to one > extended grapheme cluster. I'm grateful that Emacs allows me to > delete and replace the first NFC character of a grapheme cluster. > > Richard. From unicode at unicode.org Thu Apr 20 20:19:05 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Fri, 21 Apr 2017 02:19:05 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> <20170420201410.70377691@JRWUBU2> Message-ID: <20170421021905.5de801c5@JRWUBU2> On Thu, 20 Apr 2017 14:14:00 -0700 Manish Goregaokar via Unicode wrote: > On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 11:17:05 -0700 > > Manish Goregaokar via Unicode wrote: > >> I'm of the opinion that Unicode should start considering devanagari > >> (and possibly other indic) consonant clusters as single extended > >> grapheme clusters. > > You won't like it if cursor movement granularity is reduced to one > > extended grapheme cluster. I'm grateful that Emacs allows me to > I mean, we do the same for Hangul. Hangul is generally a maximum of three characters, which is about the border of tolerance. I find it irritating to have to completely retype Thai grapheme clusters of consonant, vowel and tone mark. There were loud protests from the Thais when preposed vowels were added to the Thai grapheme cluster and implementations then responded, and Unicode quickly removed them. Now imagine you're typing Vedic Sanskrit, with its clusters and pitch indicators. > The main time you need intra-conjunct segmentation in Devanagari is > when deleting something you just typed. You'll typically be several words beyond by the time you notice, or by the time a spell-checker spots a problem. Richard. From unicode at unicode.org Fri Apr 21 00:08:24 2017 From: unicode at unicode.org (Anshuman Pandey via Unicode) Date: Fri, 21 Apr 2017 00:08:24 -0500 Subject: Counting Devanagari Aksharas In-Reply-To: <20170421021905.5de801c5@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170420201410.70377691@JRWUBU2> <20170421021905.5de801c5@JRWUBU2> Message-ID: <5994B215-DA07-4048-BE2E-06AB38D19ABA@umich.edu> > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode wrote: > > On Thu, 20 Apr 2017 14:14:00 -0700 > Manish Goregaokar via Unicode wrote: > >> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode >> wrote: > >>> On Thu, 20 Apr 2017 11:17:05 -0700 >>> Manish Goregaokar via Unicode wrote: > >>>> I'm of the opinion that Unicode should start considering devanagari >>>> (and possibly other indic) consonant clusters as single extended >>>> grapheme clusters. > >>> You won't like it if cursor movement granularity is reduced to one >>> extended grapheme cluster. I'm grateful that Emacs allows me to > >> I mean, we do the same for Hangul. > > Hangul is generally a maximum of three characters, which is about the > border of tolerance. I find it irritating to have to completely retype > Thai grapheme clusters of consonant, vowel and tone mark. There were > loud protests from the Thais when preposed vowels were added to the > Thai grapheme cluster and implementations then responded, and Unicode > quickly removed them. Now imagine you're typing Vedic Sanskrit, with its > clusters and pitch indicators. I tried typing Vedic Sanskrit, and it seems to work: http://pandey.pythonanywhere.com/devsyll Haven't tried the orthographic oddity of the Nepali case in question. Above my pay grade. If you access the above link on an iOS device you'll see tofu and missing characters. Apple's Devanagari font needs to be fixed. - AP -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Apr 21 02:23:33 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Fri, 21 Apr 2017 08:23:33 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: <5994B215-DA07-4048-BE2E-06AB38D19ABA@umich.edu> References: <20170420003539.205acdb4@JRWUBU2> <20170420201410.70377691@JRWUBU2> <20170421021905.5de801c5@JRWUBU2> <5994B215-DA07-4048-BE2E-06AB38D19ABA@umich.edu> Message-ID: <20170421082333.4232cf67@JRWUBU2> On Fri, 21 Apr 2017 00:08:24 -0500 Anshuman Pandey via Unicode wrote: > > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > > wrote: > > Now imagine you're > > typing Vedic Sanskrit, with its clusters and pitch indicators. > I tried typing Vedic Sanskrit, and it seems to work: > http://pandey.pythonanywhere.com/devsyll That should demonstrate nothing relevant if you type correctly first time. The issue comes when you mistype and have to correct, to give the usual worst case, the first letter of a conjunct. Now, I looked at your page in Firefox on Ubuntu, and I found the cursor seemed to move by extended grapheme cluster. That means that to change a consonant you have to retype the following marks. I did find two issues with your analyser. Firstly, it broke ??????????? into ????????????, which does not concatenate back to the original. Secondly, you have a problem with ANUDATTA. You are not accepting as a syllable. Perhaps you believed https://www.microsoft.com/typography/OpenTypeDev/devanagari/intro.htm as to the structure of a Devanagari syllable. I suspect ANUDATTA as a consonant modifier went out when U+097B DEVANAGARI LETTER GGA and the like came in. Richard. From unicode at unicode.org Fri Apr 21 13:18:00 2017 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Fri, 21 Apr 2017 11:18:00 -0700 Subject: Counting Devanagari Aksharas In-Reply-To: <20170421082333.4232cf67@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170420201410.70377691@JRWUBU2> <20170421021905.5de801c5@JRWUBU2> <5994B215-DA07-4048-BE2E-06AB38D19ABA@umich.edu> <20170421082333.4232cf67@JRWUBU2> Message-ID: That seems like a relatively niche use case (especially with Vedic Sanskrit) compared to having weird selection for everything else. I'm not convinced. When I use a romanized Devanagari input method (I typically do on my laptop), deleting the whole cluster is necessary anyway for things to work well. Direct input methods do let you edit in a more granular way but I've never seen the need for that. I guess this boils down to a matter of opinion and anecdotal experience, so there's not much I can do to convince this list otherwise :) -Manish On Fri, Apr 21, 2017 at 12:23 AM, Richard Wordingham via Unicode wrote: > On Fri, 21 Apr 2017 00:08:24 -0500 > Anshuman Pandey via Unicode wrote: > >> > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode >> > wrote: > >> > Now imagine you're >> > typing Vedic Sanskrit, with its clusters and pitch indicators. > >> I tried typing Vedic Sanskrit, and it seems to work: > >> http://pandey.pythonanywhere.com/devsyll > > That should demonstrate nothing relevant if you type correctly first > time. The issue comes when you mistype and have to correct, to give > the usual worst case, the first letter of a conjunct. Now, I looked at > your page in Firefox on Ubuntu, and I found the cursor seemed to move > by extended grapheme cluster. That means that to change a consonant > you have to retype the following marks. > > I did find two issues with your analyser. > > Firstly, it broke ??????????? into ????????????, which does not > concatenate back to the original. > > Secondly, you have a problem with ANUDATTA. You are not accepting > as a syllable. Perhaps you believed > https://www.microsoft.com/typography/OpenTypeDev/devanagari/intro.htm > as to the structure of a Devanagari syllable. I suspect ANUDATTA as a > consonant modifier went out when U+097B DEVANAGARI LETTER GGA and the > like came in. > > Richard. > From unicode at unicode.org Fri Apr 21 18:04:27 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sat, 22 Apr 2017 00:04:27 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> Message-ID: <20170422000427.2859fb8b@JRWUBU2> On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode > wrote: > > Is there consensus on how to count aksharas in the Devanagari > > script? The doubts I have relate to a visible halant in > > orthographic syllables other than the first. > I don't think there's consensus. I've found related discussion at https://lists.w3.org/Archives/Public/public-i18n-indic/. The question of how to count was raised and not answered there. > On Wed, Apr 19, 2017 at 4:35 PM, > Richard Wordingham via Unicode wrote: > > Is there consensus on how to count aksharas in the Devanagari > > script? The doubts I have relate to a visible halant in > > orthographic syllables other than the first. > I'm of the opinion that Unicode should start considering devanagari > (and possibly other indic) consonant clusters as single extended > grapheme clusters. Do Hindi speakers really think of orthographic syllables as characters? What may be useful is the concept of a definition of an orthographic syllable. It may be possible to get the information from a font - depending on the renderer - but a locale-dependent definition should be possible for use as a fall-back. Devanagari rules won't work for Tamil, and I think rules for Hindi and Nepali will be slightly different - looks like a problem. The concept is possibly not useful in some Indic scripts - the concept won't work well in Thai, but will work in Pali in the Thai script, for both Pali orthographies. Richard. From unicode at unicode.org Fri Apr 21 18:27:43 2017 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Fri, 21 Apr 2017 16:27:43 -0700 Subject: Counting Devanagari Aksharas In-Reply-To: <20170422000427.2859fb8b@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> Message-ID: > Do Hindi speakers really think of orthographic syllables as characters? When rendered as a cluster, yes? I've asked around, and folks seem to insist on coupling it to the rendering. Given most fonts render *normal* (common, etc) clusters, I think making them EGCs and looking at nonrendered clusters the same way we do family emoji is fine (family emojis of length 5 are a single EGC, but that's not what's actually perceived by the user, but it's a use case that's very rare in the wild, so it doesn't matter). The way I see it, the current system is wrong, and so would the proposed system of not breaking at viramas (or not breaking at viramas followed by a consonant if we want to be more precise), but the proposed system would be wrong much less often. I am only talking about Devanagari, though scripts like Bangla/Gujrati/Gurmukhi may have similar needs. Breaking on ZWNJ seems sensible. -Manish On Fri, Apr 21, 2017 at 4:04 PM, Richard Wordingham via Unicode wrote: > On Thu, 20 Apr 2017 11:17:05 -0700 > Manish Goregaokar via Unicode wrote: > >> On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode >> wrote: > >> > Is there consensus on how to count aksharas in the Devanagari >> > script? The doubts I have relate to a visible halant in >> > orthographic syllables other than the first. > >> I don't think there's consensus. > > I've found related discussion at > https://lists.w3.org/Archives/Public/public-i18n-indic/. The question > of how to count was raised and not answered there. > >> On Wed, Apr 19, 2017 at 4:35 PM, >> Richard Wordingham via Unicode wrote: >> > Is there consensus on how to count aksharas in the Devanagari >> > script? The doubts I have relate to a visible halant in >> > orthographic syllables other than the first. > >> I'm of the opinion that Unicode should start considering devanagari >> (and possibly other indic) consonant clusters as single extended >> grapheme clusters. > > Do Hindi speakers really think of orthographic syllables as characters? > > What may be useful is the concept of a definition of an orthographic > syllable. It may be possible to get the information from a font - > depending on the renderer - but a locale-dependent definition should be > possible for use as a fall-back. Devanagari rules won't work for > Tamil, and I think rules for Hindi and Nepali will be slightly > different - looks like a problem. > > The concept is possibly not useful in some Indic scripts - the concept > won't work well in Thai, but will work in Pali in the Thai script, for > both Pali orthographies. > > Richard. From unicode at unicode.org Sat Apr 22 05:13:16 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sat, 22 Apr 2017 11:13:16 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> Message-ID: <20170422111316.7c6f6d44@JRWUBU2> On Fri, 21 Apr 2017 16:27:43 -0700 Manish Goregaokar via Unicode wrote: > > Do Hindi speakers really think of orthographic syllables as > > characters? > > When rendered as a cluster, yes? I've asked around, and folks seem to > insist on coupling it to the rendering. That argues that it's a unit, which I don't think is in dispute. Words are also units, and nowadays we don't normally insist that one retype a word just to change one bit of it. > Given most fonts render > *normal* (common, etc) clusters, I think making them EGCs and looking > at nonrendered clusters the same way we do family emoji is fine > (family emojis of length 5 are a single EGC, but that's not what's > actually perceived by the user, but it's a use case that's very rare > in the wild, so it doesn't matter). That depends on the language. In the Tai Tham script, even without consonant clusters one can get 5 graphic characters in a syllable, e.g. ????? _cao_ 'lord; you (polite)', and when one adds consonant clusters one easily gets monosyllables like ??????? _kluai_ 'banana' with 5 graphic characters and additionally 2 coengs. (One can distinguish Pali from the Tai languages simply by the density of the ink!) At present these are split into two and three grapheme clusters respectively, and LibreOffice cursor movement responds accordingly. (SIGN AA starts a grapheme cluster in several scripts of further India.) However, if one teaches the Emacs editor what a Tai Tham syllable is, so that it can use the M17n rendering library, the cursor then advances syllable by syllable, which is unpleasant for imperfect typists. Fortunately, it's possible to add functions to Emacs to allow it to advance character-by-character; I forget if one has to also add a few code changes. (The downside is that text either side of the cursor is rendered independently, which can be a nuisance when editing very long lines.) > The way I see it, the current > system is wrong, and so would the proposed system of not breaking at > viramas (or not breaking at viramas followed by a consonant if we want > to be more precise), but the proposed system would be wrong much less > often. > I am only talking about Devanagari, though scripts like > Bangla/Gujrati/Gurmukhi may have similar needs. Breaking on ZWNJ seems > sensible. Indeed, viramas (InSC=Virama) will have to be handled case-by-case. One should continue to break after pulli (U+0BCD TAMIL SIGN VIRAMA) except for the cases of the ligatures/conjuncts. I don't know if there are obscure cases, or whether it's only _shri_ and for which one should not break just because of the virama. Continuation after coengs (InSC=Invisible_Stacker) should be automatic. Malayalam will need customisation. Definitions by codepoints are only a fallback, for when a font cannot be used to guide the process. Formally, normalisation is a problem, as these characters can be separated from letters by other marks. This is a problem in practice for normalised text in Tai Tham. Pure killers (InSC=Pure_Killer) should probably be given no special treatment, as at present, by default, though I wonder if we should define orthographic syllables for Pali in Thai script. The two orthographies will need different rules, and renderers won't help. Defining orthographic syllables for languages in the Latin script is probably excessive. Richard. From unicode at unicode.org Sat Apr 22 05:34:32 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Sat, 22 Apr 2017 13:34:32 +0300 Subject: Counting Devanagari Aksharas In-Reply-To: <20170422111316.7c6f6d44@JRWUBU2> (message from Richard Wordingham via Unicode on Sat, 22 Apr 2017 11:13:16 +0100) References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> Message-ID: <83bmrorarr.fsf@gnu.org> > Date: Sat, 22 Apr 2017 11:13:16 +0100 > From: Richard Wordingham via Unicode > > At present these are split into two and three grapheme clusters > respectively, and LibreOffice cursor movement responds accordingly. > (SIGN AA starts a grapheme cluster in several scripts of further > India.) However, if one teaches the Emacs editor what a Tai Tham > syllable is, so that it can use the M17n rendering library, the cursor > then advances syllable by syllable, which is unpleasant for imperfect > typists. AFAIR, Emacs allows one to _delete_ individual characters, i.e. Backspace and C-d delete character-by-character, so the problem shouldn't be so grave for imperfect typists. Movement by grapheme cluster is AFAIK the most natural way of moving in complex scripts. From unicode at unicode.org Sat Apr 22 11:13:36 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sat, 22 Apr 2017 17:13:36 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: <83bmrorarr.fsf@gnu.org> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> Message-ID: <20170422171336.3d1bdc0e@JRWUBU2> On Sat, 22 Apr 2017 13:34:32 +0300 Eli Zaretskii via Unicode wrote: > AFAIR, Emacs allows one to _delete_ individual characters, > i.e. Backspace and C-d delete character-by-character, so the problem > shouldn't be so grave for imperfect typists. Deleting forwards by one _character_ certainly makes life less harsh. It's pleasanter than the UAX#29 suggestion, "For example, on a given system the backspace key might delete by code point, while the delete key may delete an entire cluster". > Movement by grapheme > cluster is AFAIK the most natural way of moving in complex scripts. Evidence? It's easiest for displaying the cursor. I've encountered the problem that, while at least I can search for text smaller than a cluster, there's no indication in the window of where in the window the text is. SIL's Graphite supports the idea of a split cursor, which shows the glyphs corresponding to the characters before and after the cursor position. Richard. From unicode at unicode.org Sat Apr 22 14:22:39 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Sat, 22 Apr 2017 22:22:39 +0300 Subject: Counting Devanagari Aksharas In-Reply-To: <20170422171336.3d1bdc0e@JRWUBU2> (message from Richard Wordingham via Unicode on Sat, 22 Apr 2017 17:13:36 +0100) References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> Message-ID: <83y3usp7r4.fsf@gnu.org> > Date: Sat, 22 Apr 2017 17:13:36 +0100 > From: Richard Wordingham via Unicode > > > Movement by grapheme > > cluster is AFAIK the most natural way of moving in complex scripts. > > Evidence? Personal experience? > It's easiest for displaying the cursor. It's the _only_ way of displaying the cursor. You cannot even meaningfully move by single characters in most clusters, because composing characters generally completely changes how the original characters looked, so there's nowhere you can display the cursor. And without being able to position the cursor, a visual feedback to the user becomes troublesome at best. > I've encountered the problem that, while at least I can search for > text smaller than a cluster, there's no indication in the window of > where in the window the text is. I could imagine Emacs decomposing characters temporarily when only part of a cluster matches the search string. Assuming this would make sense to users of some complex scripts, that is. You are welcome to suggest such a feature by using report-emacs-bug. > SIL's Graphite supports the idea of a split cursor, which > shows the glyphs corresponding to the characters before and after the > cursor position. I find split-cursor to be a nuisance, FWIW. IME, it confuses the users without making anything much clearer. From unicode at unicode.org Sat Apr 22 15:39:42 2017 From: unicode at unicode.org (Julian Bradfield via Unicode) Date: Sat, 22 Apr 2017 21:39:42 +0100 (BST) Subject: Counting Devanagari Aksharas References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> Message-ID: On 2017-04-22, Eli Zaretskii via Unicode wrote: >> From: Richard Wordingham via Unicode [...] >> I've encountered the problem that, while at least I can search for >> text smaller than a cluster, there's no indication in the window of >> where in the window the text is. > > I could imagine Emacs decomposing characters temporarily when only > part of a cluster matches the search string. Assuming this would make > sense to users of some complex scripts, that is. You are welcome to > suggest such a feature by using report-emacs-bug. That's what I do in my emacs with combining characters, and if I had complex script support, I'd expect the same to happen there. emacs is a programmer's editor, after all :) -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From unicode at unicode.org Sat Apr 22 18:51:59 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sun, 23 Apr 2017 00:51:59 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> Message-ID: <20170423005159.14ed0bac@JRWUBU2> On Sat, 22 Apr 2017 21:39:42 +0100 (BST) Julian Bradfield via Unicode wrote: > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > I could imagine Emacs decomposing characters temporarily when only > > part of a cluster matches the search string. Assuming this would > > make sense to users of some complex scripts, that is. You are > > welcome to suggest such a feature by using report-emacs-bug. The cursor moves to the cluster boundary, so there is much less of a problem with Emacs. > That's what I do in my emacs with combining characters, and if I had > complex script support, I'd expect the same to happen there. > emacs is a programmer's editor, after all :) Emacs probably has a way of toggling complex script support somewhere. I'm torn between seeing the text properly set out and seeing exactly what it is that I've typed. 'Reveal codes' doesn't seem widely supported. Richard. From unicode at unicode.org Sat Apr 22 21:40:29 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Sun, 23 Apr 2017 05:40:29 +0300 Subject: Counting Devanagari Aksharas In-Reply-To: <20170423005159.14ed0bac@JRWUBU2> (message from Richard Wordingham via Unicode on Sun, 23 Apr 2017 00:51:59 +0100) References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> Message-ID: <83o9vnq21u.fsf@gnu.org> > Date: Sun, 23 Apr 2017 00:51:59 +0100 > Cc: Julian Bradfield > From: Richard Wordingham via Unicode > > On Sat, 22 Apr 2017 21:39:42 +0100 (BST) > Julian Bradfield via Unicode wrote: > > > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > > > I could imagine Emacs decomposing characters temporarily when only > > > part of a cluster matches the search string. Assuming this would > > > make sense to users of some complex scripts, that is. You are > > > welcome to suggest such a feature by using report-emacs-bug. > > The cursor moves to the cluster boundary, so there is much less of a > problem with Emacs. But you wanted to highlight only part of the cluster, AFAIU. > > That's what I do in my emacs with combining characters, and if I had > > complex script support, I'd expect the same to happen there. > > emacs is a programmer's editor, after all :) > > Emacs probably has a way of toggling complex script support somewhere. > I'm torn between seeing the text properly set out and seeing exactly > what it is that I've typed. 'Reveal codes' doesn't seem widely > supported. "M-x auto-composition-mode RET" should do what you want. From unicode at unicode.org Sat Apr 22 23:25:02 2017 From: unicode at unicode.org (Manish Goregaokar via Unicode) Date: Sat, 22 Apr 2017 21:25:02 -0700 Subject: Counting Devanagari Aksharas In-Reply-To: <83y3usp7r4.fsf@gnu.org> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> Message-ID: > You cannot even > meaningfully move by single characters in most clusters, because > composing characters generally completely changes how the original > characters looked, so there's nowhere you can display the cursor. Yes, and this is one of the reasons it feels broken in devanagari, you get cursors in the midst of aksharas, in weird places. Backspace in browsers (chrome and firefox) deletes within EGCs too. They delete matras in devanagari, and jamos in hangul. They don't *exactly* work off of code points (e.g. flag emoji gets deleted as a whole in many backspace implementations) -Manish On Sat, Apr 22, 2017 at 12:22 PM, Eli Zaretskii via Unicode wrote: >> Date: Sat, 22 Apr 2017 17:13:36 +0100 >> From: Richard Wordingham via Unicode >> >> > Movement by grapheme >> > cluster is AFAIK the most natural way of moving in complex scripts. >> >> Evidence? > > Personal experience? > >> It's easiest for displaying the cursor. > > It's the _only_ way of displaying the cursor. You cannot even > meaningfully move by single characters in most clusters, because > composing characters generally completely changes how the original > characters looked, so there's nowhere you can display the cursor. And > without being able to position the cursor, a visual feedback to the > user becomes troublesome at best. > >> I've encountered the problem that, while at least I can search for >> text smaller than a cluster, there's no indication in the window of >> where in the window the text is. > > I could imagine Emacs decomposing characters temporarily when only > part of a cluster matches the search string. Assuming this would make > sense to users of some complex scripts, that is. You are welcome to > suggest such a feature by using report-emacs-bug. > >> SIL's Graphite supports the idea of a split cursor, which >> shows the glyphs corresponding to the characters before and after the >> cursor position. > > I find split-cursor to be a nuisance, FWIW. IME, it confuses the > users without making anything much clearer. From unicode at unicode.org Sun Apr 23 01:22:38 2017 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Sat, 22 Apr 2017 23:22:38 -0700 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> Message-ID: <33d2d89d-b976-1afe-81a9-deb6d8cd63da@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Apr 23 14:06:26 2017 From: unicode at unicode.org (Naena Guru via Unicode) Date: Mon, 24 Apr 2017 00:36:26 +0530 Subject: Counting Devanagari Aksharas In-Reply-To: <20170420003539.205acdb4@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> Message-ID: The Unicode approach to Sanskrit and all Indic is flawed. Indic should not be letter-assembly systems. Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of the speech. Each writing system then assigns a shape to the phonetically precise phoneme. The most technically and grammatically proper solution for Indic is first to ROMANIZE the group of writing systems at the level of phonemes. That is, assign romanized shapes to vowels, consonants, prenasals, post-vowel phonemes (anusvara and visarjaniiya with its allophones) etc. This approach is similar to how European languages picked up Latin, improvised the script and even uses Simples and Capitals repertoire. Romanizing immediately makes typing easier and eliminates sometimes embarrassing ambiguity in Anglicizing -- you type phonetically on key layouts close to QWERTY. (Only four positions are different in Romanized Sinhala layout). If we drop the capitalizing rules and utilize caps to indicate the 'other' forms of a common letter, we get an intuitively typed system for each language, and readable too. When this is done carefully, comparing phoneme sets of the languages, we can reach a common set of Latin-derived SINGLE-BYTE letters completely covering all phonemes of all Indic. Next, each native script can be obtained by making orthographic smart fonts that display the SBCS codes in the respective shapes of the native scripts. I have successfully romanized Sinhala and revived the full repertoire of Sinhla + Sanskrit orthography losing nothing. Sinhala script is perhaps the most complex of all Indic because it is used to write both Sanskrit and Pali. See this: http://ahangama.com/ (It's all SBCS underneath). Test here: http://ahangama.com/edit.htm On 4/20/2017 5:05 AM, Richard Wordingham via Unicode wrote: > Is there consensus on how to count aksharas in the Devanagari script? > The doubts I have relate to a visible halant in orthographic syllables > other than the first. > > For example, according to 'Devanagari VIP Team Issues Report' > http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a > derived form from Nepali ???????? should be written ??????????? > DEVANAGARI LETTER RA, U+0940 DEVANAGARI VOWEL SIGN II, U+092E > DEVANAGARI LETTER MA, U+093E DEVANAGARI VOWEL SIGN AA, U+0928 > DEVANAGARI LETTER NA, U+094D, U+200C ZERO WIDTH NON-JOINER, U+0915 > DEVANAGARI LETTER KA, U+094B DEVANAGARI VOWEL SIGN O> and not > ?????????? U+094D, U+0915, U+094B>. Now, if the font used has a conjunct for > SHRA, I would count the former as having 4 aksharas SH.RII, MAA, N, KO > and the latter as having 3 aksharas SH.RII, MAA, N.KO. > > If the font leads to the use of a visible halant instead of the vattu > conjunct SH.RA, as happens when I view this email, would there then be > 5 and 4 aksharas respectively? A further complication is that the font > chosen treats what looks like SH, RA as a conjunct; the vowel I appears > to the left of SH when added after RA (????). > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Apr 23 16:59:49 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sun, 23 Apr 2017 22:59:49 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: <83o9vnq21u.fsf@gnu.org> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> Message-ID: <20170423225949.75f06e5a@JRWUBU2> On Sun, 23 Apr 2017 05:40:29 +0300 Eli Zaretskii via Unicode wrote: > > The cursor moves to the cluster boundary, so there is much less of a > > problem with Emacs. > > But you wanted to highlight only part of the cluster, AFAIU. If I search for CGJ, highlighting it is frequently supremely useless. I want to know where it is; highlighting is merely a tool to find it on the screen. Richard. From unicode at unicode.org Mon Apr 24 02:08:19 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Mon, 24 Apr 2017 08:08:19 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> Message-ID: <20170424080819.34cfb34c@JRWUBU2> On Mon, 24 Apr 2017 00:36:26 +0530 Naena Guru via Unicode wrote: > The Unicode approach to Sanskrit and all Indic is flawed. Indic > should not be letter-assembly systems. > > Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of > the speech. Each writing system then assigns a shape to the > phonetically precise phoneme. > > The most technically and grammatically proper solution for Indic is > first to ROMANIZE the group of writing systems at the level of > phonemes. That is, assign romanized shapes to vowels, consonants, > prenasals, post-vowel phonemes (anusvara and visarjaniiya with its > allophones) etc. This approach is similar to how European languages > picked up Latin, improvised the script and even uses Simples and > Capitals repertoire. Romanizing immediately makes typing easier and > eliminates sometimes embarrassing ambiguity in Anglicizing -- you > type phonetically on key layouts close to QWERTY. (Only four > positions are different in Romanized Sinhala layout). > > If we drop the capitalizing rules and utilize caps to indicate the > 'other' forms of a common letter, we get an intuitively typed system > for each language, and readable too. When this is done carefully, > comparing phoneme sets of the languages, we can reach a common set of > Latin-derived SINGLE-BYTE letters completely covering all phonemes of > all Indic. Unless this implies a spelling reform for many languages, I'd like to see how this works for the Tai Tham script. I'm not happy with the Romanisation I use to work round hostile rendering engines. (My scheme is only documented in variable hack_ss02 in the last script blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.) For example, there are several different ways of writing what one might naively record as "ontarAy". > Next, each native script can be obtained by making orthographic smart > fonts that display the SBCS codes in the respective shapes of the > native scripts. That sounds like a letter-assembly system. So how does your scheme help one split words into orthographic syllables? > I have successfully romanized Sinhala and revived the full repertoire > of Sinhla + Sanskrit orthography losing nothing. Sinhala script is > perhaps the most complex of all Indic because it is used to write > both Sanskrit and Pali. What complication does Pali impose on top of Sanskrit. As far as I'm aware, it just needs one extra letter, usually called LLA, which you will already have if 'Sanskrit' includes Vedic Sanskrit. > See this: http://ahangama.com/ (It's all SBCS underneath). > Test here: http://ahangama.com/edit.htm All I get for these are blank pages. Perhaps there's an unreported communication failure in the network, Richard. From unicode at unicode.org Mon Apr 24 10:23:12 2017 From: unicode at unicode.org (Naena Guru via Unicode) Date: Mon, 24 Apr 2017 20:53:12 +0530 Subject: Go romanize! Re: Counting Devanagari Aksharas In-Reply-To: <20170423225949.75f06e5a@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> <20170423225949.75f06e5a@JRWUBU2> Message-ID: Quote by Richard: Unless this implies a spelling reform for many languages, I'd like to see how this works for the Tai Tham script. I'm not happy with the Romanisation I use to work round hostile rendering engines. (My scheme is only documented in variable hack_ss02 in the last script blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.) For example, there are several different ways of writing what one might naively record as "ontarAy". MY RESPONSE: Richard, I stuck to the two specifications (Unicode and Font) and Sanskrit grammar. The akSara has two aspects, its sound (zab?a, phoneme) and its shape. (letter, ruupa). Reduce the writing system to its consonants, vowels etc. (zab?a) and assign SBCS letters/codes to them (ruupa). SBCS provides the best technical facilities for any language. (This is why now more than 130 languages romanize despite Unicode). Use English letters for similar sounds in the native speech. Now, treat all combinations as ligatures. For example, 'po' sound in Indic has the p consonant with a sign ahead plus a sign after. For the font, there is no difference between the way it makes the combination '?', which has a sign above and the Indic having two on either side. Recall that long ago, Unicode stopped defining fixed ligatures and asked the font makers to define them in the PUA. Spelling and speech: There is indeed a confusion about writing and reading in Hindi, as I have observed. Like in English and Tamil, Hindi tends to end words with a consonant. So, there is this habit among the Hindi speakers to drop the ending vowel, mostly 'a' from words that actually end with it. For example, the famous name Jayantha (miserable mine too, haha! = jayan?a as Romanized), is pronounced Jayanth by Hindi speakers. It is a Sanskrit word. Sanskrit and languages like Sinhhala have vowel ending and are traditionally spoken as such. Dictionary is a commercial invention. When Caxton brought lead types to England, French-speaking Latin-flaunting elites did not care about the poor natives. Earlier, invading Romans forced them to drop Fu?ark and adopt the 22-letter Latin alphabet. So, they improvised. Struck a line across d and made ?, Eth; added a sign to 'a' and made ? (Asc) and continued using Thorn (?) by rounding the loop. Lead type printing hit English for the second time, ruining it as the spell standardizing began. Dictionaries sold. THE POWERFUL CAN RUIN PEOPLE'S PROPERTY BECAUSE THEY CAN IN ORDER TO MAKE MONEY. Unicode enthusiasts, take heed! Looking at the word you gave, ontarAy, it looks to me like an Anglicized form. If I am to make a guess, its ending is like in ontarAyi. Is it said something like, own-the-raa-yi? (danger?) If I am right, this is a good example of decline if a writing system owing to bad, uncaring application of technology. We are in the Digital Age, and we need not compromise any more. In fact, we can fix errors and decadence introduced by past technologies. RICHARD: That sounds like a letter-assembly system. MY RESPONSE: Nothing assembled there, my friend. On 4/24/2017 12:38 PM, Richard Wordingham via Unicode wrote: > On Mon, 24 Apr 2017 00:36:26 +0530 > Naena Guru via Unicode wrote: > >> The Unicode approach to Sanskrit and all Indic is flawed. Indic >> should not be letter-assembly systems. >> >> Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of >> the speech. Each writing system then assigns a shape to the >> phonetically precise phoneme. >> >> The most technically and grammatically proper solution for Indic is >> first to ROMANIZE the group of writing systems at the level of >> phonemes. That is, assign romanized shapes to vowels, consonants, >> prenasals, post-vowel phonemes (anusvara and visarjaniiya with its >> allophones) etc. This approach is similar to how European languages >> picked up Latin, improvised the script and even uses Simples and >> Capitals repertoire. Romanizing immediately makes typing easier and >> eliminates sometimes embarrassing ambiguity in Anglicizing -- you >> type phonetically on key layouts close to QWERTY. (Only four >> positions are different in Romanized Sinhala layout). >> >> If we drop the capitalizing rules and utilize caps to indicate the >> 'other' forms of a common letter, we get an intuitively typed system >> for each language, and readable too. When this is done carefully, >> comparing phoneme sets of the languages, we can reach a common set of >> Latin-derived SINGLE-BYTE letters completely covering all phonemes of >> all Indic. > Unless this implies a spelling reform for many languages, I'd like to > see how this works for the Tai Tham script. I'm not happy with the > Romanisation I use to work round hostile rendering engines. (My > scheme is only documented in variable hack_ss02 in the last script > blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.) For example, > there are several different ways of writing what one might naively > record as "ontarAy". > >> Next, each native script can be obtained by making orthographic smart >> fonts that display the SBCS codes in the respective shapes of the >> native scripts. > That sounds like a letter-assembly system. > > So how does your scheme help one split words into orthographic > syllables? > >> I have successfully romanized Sinhala and revived the full repertoire >> of Sinhla + Sanskrit orthography losing nothing. Sinhala script is >> perhaps the most complex of all Indic because it is used to write >> both Sanskrit and Pali. > What complication does Pali impose on top of Sanskrit. As far as I'm > aware, it just needs one extra letter, usually called LLA, which you > will already have if 'Sanskrit' includes Vedic Sanskrit. > >> See this: http://ahangama.com/ (It's all SBCS underneath). >> Test here: http://ahangama.com/edit.htm > All I get for these are blank pages. Perhaps there's an unreported > communication failure in the network, > > Richard. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Apr 24 15:37:04 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Mon, 24 Apr 2017 21:37:04 +0100 Subject: Go romanize! Re: Counting Devanagari Aksharas In-Reply-To: References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> <20170423225949.75f06e5a@JRWUBU2> Message-ID: <20170424213704.6c1882cc@JRWUBU2> On Mon, 24 Apr 2017 20:53:12 +0530 Naena Guru via Unicode wrote: > Quote by Richard: > Unless this implies a spelling reform for many languages, I'd like to > see how this works for the Tai Tham script. I'm not happy with the > Romanisation I use to work round hostile rendering engines. (My > scheme is only documented in variable hack_ss02 in the last script > blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.) For > example, there are several different ways of writing what one might > naively record as "ontarAy". > > MY RESPONSE: > Richard, I stuck to the two specifications (Unicode and Font) and > Sanskrit grammar. The akSara has two aspects, its sound (zab?a, > phoneme) and its shape. (letter, ruupa). Reduce the writing system to > its consonants, vowels etc. (zab?a) and assign SBCS letters/codes to > them (ruupa). SBCS provides the best technical facilities for any > language. (This is why now more than 130 languages romanize despite > Unicode). Use English letters for similar sounds in the native > speech. Now, treat all combinations as ligatures. For example, 'po' > sound in Indic has the p consonant with a sign ahead plus a sign > after. In many Indic scripts, yes. In Devanagari, the vowel sign is normally a singly element classified as following the consonant. In Thai, the vowel sign precedes the consonant. Tai Tham uses both a two-part sign and a preceding sign. The preceding sign is for Tai words and the two-part sign for Pali words, but loanwords from Pali into the Tai languages may retain the two part sign. > For the font, there is no difference between the way it makes > the combination '?', which has a sign above and the Indic having two > on either side. For OpenType, there is. The first can be made by providing a simple table of where the diaeresis goes relative to the base characters, in this case the diaeresis. The second is painfully complicated, for the 'p' may have other marks attached to it, so doing it be relative positioning is painfully complicated and error-prone. This job is given to the rendering engine, which may introduce its own problems. AAT and Graphite offer the font maker the ability to move the 'sign ahead' from after the 'p' to before it. > Recall that long ago, Unicode stopped defining fixed > ligatures and asked the font makers to define them in the PUA. While the first is true enough, I believe the second is false. Not every glyph has to be mapped to by a single character. I don't do that for contextual forms or ligatures in my font. > Spelling and speech: > There is indeed a confusion about writing and reading in Hindi, as I > have observed. Like in English and Tamil, Hindi tends to end words > with a consonant. So, there is this habit among the Hindi speakers to > drop the ending vowel, mostly 'a' from words that actually end with > it. For example, the famous name Jayantha (miserable mine too, haha! > = jayan?a as Romanized), is pronounced Jayanth by Hindi speakers. It > is a Sanskrit word. Sanskrit and languages like Sinhhala have vowel > ending and are traditionally spoken as such. This loss is also to be found in Further India. Thai, Lao and Khmer now require that such a word-final vowel be written explicitly if it is still pronounced. > Looking at the word you gave, ontarAy, it looks to me like an > Anglicized form. If I am to make a guess, its ending is like in > ontarAyi. Is it said something like, own-the-raa-yi? (danger?) If I > am right, this is a good example of decline if a writing system owing > to bad, uncaring application of technology. We are in the Digital > Age, and we need not compromise any more. In fact, we can fix errors > and decadence introduced by past technologies. The word indeed means 'danger' (Pali/Sanskrit _antar?ya_). The pronunciation is /?ont?ala?i/; the Tai languages that use(d) the Tai Tham script no longer have /r/. The older sequence /tr/ normally became /t?/ (except in Lao), but the spelling has not been updated - at least, not amongst the more literate. The script has a special symbol for the short vowel /o/, which it shares with the Lao script. This symbol is used in writing that word. Two ways I have seen it spelt, each with two orthographic syllables, are ????????? on-trAy (the second syllable has two stacks) and ????????? o-ntrAy. I have also seen a form closer to Pali, namely _antarAy_, written ???????? a-nta-rAy. However, I have seen nothing that shows that I won't encounter ????????? a-nta-rAy with the first vowel written explicitly, or even ????????? an-ta-rAy. How does your scheme distinguish such alternatives? Richard. From unicode at unicode.org Tue Apr 25 11:41:58 2017 From: unicode at unicode.org (Naena Guru via Unicode) Date: Tue, 25 Apr 2017 22:11:58 +0530 Subject: Go romanize! Re: Counting Devanagari Aksharas In-Reply-To: <20170424213704.6c1882cc@JRWUBU2> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> <20170423225949.75f06e5a@JRWUBU2> <20170424213704.6c1882cc@JRWUBU2> Message-ID: <475e6fe8-ce7c-b6a6-f954-ba54f447d17f@gmail.com> Quote from below: The word indeed means 'danger' (Pali/Sanskrit _antar?ya_). The pronunciation is /?ont?ala?i/; the Tai languages that use(d) the Tai Tham script no longer have /r/. The older sequence /tr/ normally became /t?/ (except in Lao), but the spelling has not been updated - at least, not amongst the more literate. The script has a special symbol for the short vowel /o/, which it shares with the Lao script. This symbol is used in writing that word. Two ways I have seen it spelt, each with two orthographic syllables, are ????????? on-trAy (the second syllable has two stacks) and ????????? o-ntrAy. I have also seen a form closer to Pali, namely _antarAy_, written ???????? a-nta-rAy. However, I have seen nothing that shows that I won't encounter ????????? a-nta-rAy with the first vowel written explicitly, or even ????????? an-ta-rAy. How does your scheme distinguish such alternatives? Response: Perhaps this word is derived from Sanskrit 'an?ara?a' (Search: antarada at http://www.sanskrit-lexicon.uni-koeln.de/cgi-bin/tamil/recherche) Sinhala:an?araa?aayakayi, an?araava, an?araavayi, an?raava, an?raavayi Use this font to read the above Sinhala words: http://smartfonts.net/ttf/aruna.ttf -=- svas?i si??ham! -=- On 4/25/2017 2:07 AM, Richard Wordingham via Unicode wrote: > On Mon, 24 Apr 2017 20:53:12 +0530 > Naena Guru via Unicode wrote: > >> Quote by Richard: >> Unless this implies a spelling reform for many languages, I'd like to >> see how this works for the Tai Tham script. I'm not happy with the >> Romanisation I use to work round hostile rendering engines. (My >> scheme is only documented in variable hack_ss02 in the last script >> blocks ofhttp://wrdingam.co.uk/lanna/denderer_test.htm.) For >> example, there are several different ways of writing what one might >> naively record as "ontarAy". >> >> MY RESPONSE: >> Richard, I stuck to the two specifications (Unicode and Font) and >> Sanskrit grammar. The akSara has two aspects, its sound (zab?a, >> phoneme) and its shape. (letter, ruupa). Reduce the writing system to >> its consonants, vowels etc. (zab?a) and assign SBCS letters/codes to >> them (ruupa). SBCS provides the best technical facilities for any >> language. (This is why now more than 130 languages romanize despite >> Unicode). Use English letters for similar sounds in the native >> speech. Now, treat all combinations as ligatures. For example, 'po' >> sound in Indic has the p consonant with a sign ahead plus a sign >> after. > In many Indic scripts, yes. In Devanagari, the vowel sign is normally > a singly element classified as following the consonant. In Thai, the > vowel sign precedes the consonant. Tai Tham uses both a two-part sign > and a preceding sign. The preceding sign is for Tai words and the > two-part sign for Pali words, but loanwords from Pali into the Tai > languages may retain the two part sign. > >> For the font, there is no difference between the way it makes >> the combination '?', which has a sign above and the Indic having two >> on either side. > For OpenType, there is. The first can be made by providing a > simple table of where the diaeresis goes relative to the base > characters, in this case the diaeresis. The second is painfully > complicated, for the 'p' may have other marks attached to it, so doing > it be relative positioning is painfully complicated and error-prone. > This job is given to the rendering engine, which may introduce its own > problems. > > AAT and Graphite offer the font maker the ability to move the 'sign > ahead' from after the 'p' to before it. > >> Recall that long ago, Unicode stopped defining fixed >> ligatures and asked the font makers to define them in the PUA. > While the first is true enough, I believe the second is false. Not > every glyph has to be mapped to by a single character. I don't do that > for contextual forms or ligatures in my font. > >> Spelling and speech: >> There is indeed a confusion about writing and reading in Hindi, as I >> have observed. Like in English and Tamil, Hindi tends to end words >> with a consonant. So, there is this habit among the Hindi speakers to >> drop the ending vowel, mostly 'a' from words that actually end with >> it. For example, the famous name Jayantha (miserable mine too, haha! >> = jayan?a as Romanized), is pronounced Jayanth by Hindi speakers. It >> is a Sanskrit word. Sanskrit and languages like Sinhhala have vowel >> ending and are traditionally spoken as such. > This loss is also to be found in Further India. Thai, Lao and Khmer > now require that such a word-final vowel be written explicitly if it is > still pronounced. > >> Looking at the word you gave, ontarAy, it looks to me like an >> Anglicized form. If I am to make a guess, its ending is like in >> ontarAyi. Is it said something like, own-the-raa-yi? (danger?) If I >> am right, this is a good example of decline if a writing system owing >> to bad, uncaring application of technology. We are in the Digital >> Age, and we need not compromise any more. In fact, we can fix errors >> and decadence introduced by past technologies. > The word indeed means 'danger' (Pali/Sanskrit _antar?ya_). The > pronunciation is /?ont?ala?i/; the Tai languages that use(d) the Tai > Tham script no longer have /r/. The older sequence /tr/ normally > became /t?/ (except in Lao), but the spelling has not been updated - at > least, not amongst the more literate. The script has a special symbol > for the short vowel /o/, which it shares with the Lao script. This > symbol is used in writing that word. Two ways I have seen it spelt, > each with two orthographic syllables, are ????????? on-trAy (the second > syllable has two stacks) and ????????? o-ntrAy. I have also seen a > form closer to Pali, namely _antarAy_, written ???????? a-nta-rAy. > However, I have seen nothing that shows that I won't encounter > ????????? a-nta-rAy with the first vowel written explicitly, or even > ????????? an-ta-rAy. How does your scheme distinguish such alternatives? > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Apr 26 00:48:13 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Wed, 26 Apr 2017 08:48:13 +0300 Subject: Counting Devanagari Aksharas In-Reply-To: <20170423225949.75f06e5a@JRWUBU2> (message from Richard Wordingham on Sun, 23 Apr 2017 22:59:49 +0100) References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> <20170423225949.75f06e5a@JRWUBU2> Message-ID: <83efwfpvmq.fsf@gnu.org> > Date: Sun, 23 Apr 2017 22:59:49 +0100 > From: Richard Wordingham > Cc: Eli Zaretskii > > If I search for CGJ, highlighting it is frequently supremely useless. > I want to know where it is; highlighting is merely a tool to find it on > the screen. So I guess this means highlighting is useful after all ;-) From unicode at unicode.org Wed Apr 26 01:45:07 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Wed, 26 Apr 2017 07:45:07 +0100 Subject: Counting Devanagari Aksharas In-Reply-To: <83efwfpvmq.fsf@gnu.org> References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> <20170423225949.75f06e5a@JRWUBU2> <83efwfpvmq.fsf@gnu.org> Message-ID: <20170426074507.6f50fffd@JRWUBU2> On Wed, 26 Apr 2017 08:48:13 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > From: Richard Wordingham > > Cc: Eli Zaretskii > > > > If I search for CGJ, highlighting it is frequently supremely > > useless. I want to know where it is; highlighting is merely a tool > > to find it on the screen. > > So I guess this means highlighting is useful after all ;-) ?Not if the area highlit is zero pixels wide. Richard. From unicode at unicode.org Wed Apr 26 05:56:08 2017 From: unicode at unicode.org (Eli Zaretskii via Unicode) Date: Wed, 26 Apr 2017 13:56:08 +0300 Subject: Counting Devanagari Aksharas In-Reply-To: <20170426074507.6f50fffd@JRWUBU2> (message from Richard Wordingham via Unicode on Wed, 26 Apr 2017 07:45:07 +0100) References: <20170420003539.205acdb4@JRWUBU2> <20170422000427.2859fb8b@JRWUBU2> <20170422111316.7c6f6d44@JRWUBU2> <83bmrorarr.fsf@gnu.org> <20170422171336.3d1bdc0e@JRWUBU2> <83y3usp7r4.fsf@gnu.org> <20170423005159.14ed0bac@JRWUBU2> <83o9vnq21u.fsf@gnu.org> <20170423225949.75f06e5a@JRWUBU2> <83efwfpvmq.fsf@gnu.org> <20170426074507.6f50fffd@JRWUBU2> Message-ID: <83y3uno2t3.fsf@gnu.org> > Date: Wed, 26 Apr 2017 07:45:07 +0100 > From: Richard Wordingham via Unicode > > On Wed, 26 Apr 2017 08:48:13 +0300 > Eli Zaretskii via Unicode wrote: > > > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > > From: Richard Wordingham > > > Cc: Eli Zaretskii > > > > > > If I search for CGJ, highlighting it is frequently supremely > > > useless. I want to know where it is; highlighting is merely a tool > > > to find it on the screen. > > > > So I guess this means highlighting is useful after all ;-) > > ?Not if the area highlit is zero pixels wide. If you elide too much of the context, the discussion could lose all of its meaning. Let me restore some of the relevant context: > > > > > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > > > > > > > > > I could imagine Emacs decomposing characters temporarily when only > > > > > > part of a cluster matches the search string. Assuming this would > > > > > > make sense to users of some complex scripts, that is. You are > > > > > > welcome to suggest such a feature by using report-emacs-bug. > > > > > > > > The cursor moves to the cluster boundary, so there is much less of a > > > > problem with Emacs. > > > > > > But you wanted to highlight only part of the cluster, AFAIU. > > > > If I search for CGJ, highlighting it is frequently supremely useless. > > I want to know where it is; highlighting is merely a tool to find it on > > the screen. > > So I guess this means highlighting is useful after all ;-) IOW, the context was a suggestion to temporarily disable character composition, in which case CGJ _will_ be displayed as non-zero width glyph, at least in the default Emacs display configuration, and CGJ _will_ be visible with its highlight. From unicode at unicode.org Thu Apr 27 03:27:55 2017 From: unicode at unicode.org (Srinidhi A via Unicode) Date: Thu, 27 Apr 2017 13:57:55 +0530 Subject: Tibetan Paluta Message-ID: The annotation of 0F85 ? TIBETAN MARK PALUTA says it is used for avagraha. However it seems this character denotes pluta instead of avagraha. Pluta is used for indicating elongation of vowel. Similar character with identical glyph is encoded in Soyombo( 11A9D ) with name as pluta. These characters are likely derive from digit ? as ? is used in Devanagari for indicating pluta. Figure 2 of L2/16-016 shows the usage of TIBETAN MARK PALUTA for Pluta. What is the correct spelling in Tibetan language Paluta or Pluta? Can Tibetan scholars clarify the usage of above character? If 0F85 is used for Pluta ,are there any distinct characters denoting avagraha in Tibetan script. Srinidhi A -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Apr 27 10:50:12 2017 From: unicode at unicode.org (Paul Hackett via Unicode) Date: Thu, 27 Apr 2017 11:50:12 -0400 Subject: Tibetan Paluta In-Reply-To: References: Message-ID: <4660111265497340049@unknownmsgid> Dear Srinidhi, I can confirm for you that the character represented by 0F85 ? TIBETAN MARK PALUTA is, in fact, used to represent avagraha in Tibetan script. Someone else will have to speak to the source for the name of this character. Regards, Paul Hackett Columbia University On Apr 27, 2017, at 10:58 AM, Srinidhi A via Unicode wrote: The annotation of 0F85 ? TIBETAN MARK PALUTA says it is used for avagraha. However it seems this character denotes pluta instead of avagraha. Pluta is used for indicating elongation of vowel. Similar character with identical glyph is encoded in Soyombo( 11A9D ) with name as pluta. These characters are likely derive from digit ? as ? is used in Devanagari for indicating pluta. Figure 2 of L2/16-016 shows the usage of TIBETAN MARK PALUTA for Pluta. What is the correct spelling in Tibetan language Paluta or Pluta? Can Tibetan scholars clarify the usage of above character? If 0F85 is used for Pluta ,are there any distinct characters denoting avagraha in Tibetan script. Srinidhi A -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Apr 27 14:00:15 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Thu, 27 Apr 2017 20:00:15 +0100 Subject: Tibetan Paluta In-Reply-To: References: Message-ID: <20170427200015.19332c09@JRWUBU2> On Thu, 27 Apr 2017 13:57:55 +0530 Srinidhi A via Unicode wrote: > The annotation of 0F85 ? TIBETAN MARK PALUTA says it is used for > avagraha. However it seems this character denotes pluta instead of > avagraha. Pluta is used for indicating elongation of vowel. > Similar character with identical glyph is encoded in Soyombo( 11A9D ) > with name as pluta. These characters are likely derive from digit ? > as ? is used in Devanagari for indicating pluta. Avagraha has two rather different uses. One is to to mark elongation, which is its primary usage in living languages. The other is to mark prodelision, or arguably contraction, as in the Sanskrit sandhi of -e a- and -o a-. Max M?ller's comments make me wonder if he did much to popularise (create?) the latter usage. Richard. From unicode at unicode.org Sat Apr 29 14:21:20 2017 From: unicode at unicode.org (Naena Guru via Unicode) Date: Sun, 30 Apr 2017 00:51:20 +0530 Subject: Tibetan Paluta In-Reply-To: References: Message-ID: <1206d55a-0725-5488-bbe0-af0a0c3f3d10@gmail.com> Just about the name paluta: In Sanskrit, the length of vowels are measured in maa?ra (a cognate of the word 'meter'). It is the spoken length of a short vowel. In Latin it is termed mora. Usually, you have only single and double length vowels. A palu?a length is like when you call out somebody from a distance. Pluta is a careless use of spelling. Virama and Halanta are two other terms loosely used. Anyway, Unicode is only about DISPLAYING a script: There's a shape here; Let's find how to get it by assembling other shapes or by creating a code point for it. What is short, long or longer in speech is no concern for Unicode. On 4/27/2017 1:57 PM, Srinidhi A via Unicode wrote: > The annotation of 0F85 ? TIBETAN MARK PALUTA says it is used for > avagraha. However it seems this character denotes pluta instead of > avagraha. Pluta is used for indicating elongation of vowel. > Similar character with identical glyph is encoded in Soyombo( 11A9D ) > with name as pluta. These characters are likely derive from digit ? as > ? is used in Devanagari for indicating pluta. > > Figure 2 of L2/16-016 shows the usage of TIBETAN MARK PALUTA for Pluta. > What is the correct spelling in Tibetan language Paluta or Pluta? > Can Tibetan scholars clarify the usage of above character? > If 0F85 is used for Pluta ,are there any distinct characters denoting > avagraha in Tibetan script. > > Srinidhi A > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: