From wjgo_10009 at btinternet.com Mon Mar 4 07:22:32 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 4 Mar 2024 13:22:32 +0000 (GMT) Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Message-ID: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> I have no expertise in Egyptology, I am however interested in Unicode encoding research. ? I have been reading. ? https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf ? I opine that Unicode could possibly deliver that level of palaeographic detail if a custom virtual machine is defined and then implemented in the rendering system. ? The rotation, and any other movements and scaling, being encoded in a sequence of software-like commands, expressed using tag characters, being included in the plain text sequence. ? Software-like yet no loops, jumps or calls, so more like a list of hand-entered commands to a calculator. ? The glyphs that are manipulated would be obtained from the font. The obeying of the software-like sequences would be by a virtual machine in the rendering application. ? Once implemented, an end user would be able to specify rotation of a glyph by, say, 20 degrees, using a tag character sequence of something like ? 20Gr ? after the Unicode code point of the character. ? The two character tag sequence Gr being the command to the virtual machine to rotate the glyph by the number of degrees previouly stated. ? Scaling by 25% then rotating by 20 degrees by a tag sequence something like ? 25Gs20Gr ? after the Unicode code point of the character. ? The specification would need to state about which point the glyph is scaled and about which point the glyph is rotated. ? Commands such as Gh and Gv for horizontal and vertical movements respectively, with the specification stating how to specify the distance: for example, percentage of the width of a unit square. There could be a Gi command to encode movement in and out if so desired. ? Both positive numbers and negative numbers could be used for rotations and movements, so rotations could be both clockwise and counterclockwise, movements could be right, left, up, down, in, out. ? Other commands could be added as required by experts who have knowledge about the hieroglyphs. ? A hieroglyph made up of various glyphs, (each of which could be scaled, rotated, located) could be specified by a tag sequence between ?U+E007B TAG LEFT CURLY BRACKET and U+E007D TAG RIGHT CURLY BRACKET. ? ? William Overington ? Monday 4 March 2024 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Mon Mar 4 14:44:08 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 4 Mar 2024 12:44:08 -0800 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> Message-ID: <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com> What you are describing is rich text. Anytime you add "special commands", no matter how you encode them, you have rich text. (There is a small amount of gray zone, in which characters like SHY, NBSP and TAB can be understood as still being "plain text", but a syntax for a virtual rotation machine is definitely beyond the scope). A./ On 3/4/2024 5:22 AM, William_J_G Overington via Unicode wrote: > > I have no expertise in Egyptology, I am however interested in Unicode > encoding research. > > I have been reading. > > https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf > > > I opine that Unicode could possibly deliver that level of > palaeographic detail if a custom virtual machine is defined and then > implemented in the rendering system. > > The rotation, and any other movements and scaling, being encoded in a > sequence of software-like commands, expressed using tag characters, > being included in the plain text sequence. > > Software-like yet no loops, jumps or calls, so more like a list of > hand-entered commands to a calculator. > > The glyphs that are manipulated would be obtained from the font. The > obeying of the software-like sequences would be by a virtual machine > in the rendering application. > > Once implemented, an end user would be able to specify rotation of a > glyph by, say, 20 degrees, using a tag character sequence of something > like > > 20Gr > > after the Unicode code point of the character. > > The two character tag sequence Gr being the command to the virtual > machine to rotate the glyph by the number of degrees previouly stated. > > Scaling by 25% then rotating by 20 degrees by a tag sequence something > like > > 25Gs20Gr > > after the Unicode code point of the character. > > The specification would need to state about which point the glyph is > scaled and about which point the glyph is rotated. > > Commands such as Gh and Gv for horizontal and vertical movements > respectively, with the specification stating how to specify the > distance: for example, percentage of the width of a unit square. There > could be a Gi command to encode movement in and out if so desired. > > Both positive numbers and negative numbers could be used for rotations > and movements, so rotations could be both clockwise and > counterclockwise, movements could be right, left, up, down, in, out. > > Other commands could be added as required by experts who have > knowledge about the hieroglyphs. > > A hieroglyph made up of various glyphs, (each of which could be > scaled, rotated, located) could be specified by a tag sequence between > > ?U+E007B TAG LEFT CURLY BRACKET > > and > > U+E007D TAG RIGHT CURLY BRACKET. > > William Overington > > Monday 4 March 2024 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Mon Mar 4 16:13:25 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 4 Mar 2024 22:13:25 +0000 (GMT) Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Message-ID: <300bb5cd.72c5.18e0b8762df.Webtop.119@btinternet.com> Asmus Freytag wrote as follows: > What you are describing is rich text. Anytime you add "special > commands", no matter how you encode them, you have rich text. (There > is a small amount of gray zone, in which characters like SHY, NBSP and > TAB can be understood as still being "plain text", but a syntax for a > virtual rotation machine is definitely beyond the scope). Thank you for replying. Is what I suggest any less plain text than is the tag sequence specified for the Welsh flag? If so, why? At present, if I understand it correctly, special commands have been added into plain text in the form of extra characters for formatting hieroglyphs. I suggest that using this software-like approach allows far greater possibilities than can reasonable be produced by adding an extra character into Unicode for each possibility. This software-like approach means that any rotation angle can be specified. I opine that if the scope of plain text in a definition from long ago needs to be changed so as to allow progress into the future then that needs to be done. I have since making my earlier post realized that a command Gm needs to be added so that a glyph can be horizontally mirrored if so desired. Other commands and features can be added straightforwardly to this software-like system.? I opine that whether the method that I have suggested goes forward and becomes part of Unicode should depend on whether it is regarded by the Egyptologists who would use it as a method that they consider as being of benefit. William Overington Monday 4 March 2024 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Tue Mar 5 14:45:54 2024 From: jameskass at code2001.com (James Kass) Date: Tue, 5 Mar 2024 20:45:54 +0000 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com> References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com> Message-ID: On 2024-03-04 8:44 PM, Asmus Freytag via Unicode wrote: > What you are describing is rich text. Anytime you add "special > commands", no matter how you encode them, you have rich text. (There > is a small amount of gray zone, in which characters like SHY, NBSP and > TAB can be understood as still being "plain text", but a syntax for a > virtual rotation machine is definitely beyond the scope). > > A./ Egyptologists were (and are) involved in the process to encode hieroglyphics.? Any shortcomings in the encoding should be addressed by those experts.? People involved in the decisions understand the needs of the user community as well as the Unicode principals for character encoding.? The document makes a convincing argument against enabling a plethora of rotations willy-nilly at the plain-text level. Plain-text legibility is the issue for Unicode.? Whether there is a semantic difference between a glyph rotated by 48 degrees and one rotated by 49 degrees is not for me to say, but it seems unlikely. It is up to the experts to determine whether a string of hieroglyphic characters is legible.? So it would be the better practice to first determine how the user community is handling any issue before devising any mark-up scheme just because somebody might consider it useful some day. Regardless of whether fine glyph rotation gets handled at the plain-text or rich-text level, one of the arguments against fine rotation appears to be based on a misconception related to fonts and rendering. Here's one example of this misconception from the document: https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf "Rotations entail a stark increase in the size of OpenType fonts, as such fonts cannot dynamically rotate glyphs, and therefore need to store rotated copies. To avoid the blow-up in size that would result if all rotations for all signs were included, a selection of rotations for a selection of signs is registered, and fonts would only be expected to implement those." Glyph data in TrueType/Open Type fonts is stored as a series of Cartesian points.? Rotation by any valid degree is accomplished by formula.? The font engine is the logical place for implementing any transformative formula, for example - font size.? Thus, any glyph from any font could be rotated by any degree dynamically with no impact on the font file size. From asmusf at ix.netcom.com Tue Mar 5 16:19:32 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 5 Mar 2024 14:19:32 -0800 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com> Message-ID: On 3/5/2024 12:45 PM, James Kass via Unicode wrote: > Here's one example of this misconception from the document: > https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf The document makes a number of very cogent arguments that together imply that adding a layout language subset for arbitrary rotations into plain text is misguided based on the writing system, without even getting into the plain text/rich text discussion. There are some rich text environments where it is possible to achieve a control over the placement and orientation of glyphs that is rather unrestricted. Those are the correct choice when it comes to faithfully representing individual examples of actual pale0graphic texts in all their details, accidental or intentional, regular or irregular. We can all agree that duplicating such capabilities in plain text isn't desirable. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Wed Mar 6 06:46:14 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 6 Mar 2024 12:46:14 +0000 (GMT) Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Message-ID: <741cb38.9a01.18e13ccd4d1.Webtop.119@btinternet.com> James Kass wrote: ? ? > ?So it would be the better practice to first determine how the user > community is handling any issue before devising any mark-up scheme > just because somebody might consider it useful some day. ? ? Well, I tend to think of my suggestion in this thread of being like how a development in pure mathematics is often before an application for that development is found. ? ? So, yes, I am putting this suggestion forward because I consider, indeed hope, that it might be useful at some future time. At present it is just a suggestion in a thread in the Unicode public mailing list. Maybe it will never be more than that. Yet I have put the idea forward and people are welcome to apply it in practice if they so choose. If I had not put it forward then encoding the level of paleographic detail in a Unicode encoding that this system could achieve might have been considered impossible.? ? ? Actually, what I am suggesting in this thread is a small subset, with the addition of Gr and Gm to that small subset, of a theoretical software system using tag characters that I devised some years ago and which is featured in a chapter (Chapter 9 in case anyone is interested) of my first novel, which was written from time to time from June 2016 to February 2019. The commands Gs, Gh, Gv and G+ are all mentioned in that chapter. ?? ?? The full list of data types in that chapter is as follows. ?? I Integer ? D Double precision floating point ? B Boolean ? H Character ? S String ? Z Complex ? Q Quaternion ? P Point, as in a font, a point is a triple of two Integers and a Boolean ? C Contour, a sequence of P items ? G Glyph, a sequence of C items ? and L is used for commands for the Link flag. ? ? ? If applied then there may need to be a few additions so as to have colour font capability. ? ? The virtual machine can have its own temporary storage for each data type, so it will be possible to use commands such as tag characters Gp and Gg to put into and get from respectively of a memory structure in the virtual machine. ? ? For example, ? ? 3Gp would put a copy of the contents of the ag register into mg[3] in the virtual machine. ? ?? mg[3]:=ag; ? ?? It all goes back to some research that I carried out in the year 2000. ? ?? http://www.users.globalnet.co.uk/~ngo/14560000.htm ? ? I have been influenced by the FORTH computer language, which was originally devised to control the motors of a telescope in an observatory. ? ?? William Overington ?? ?? Wednesday 6 March 2024 ?? ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Wed Mar 6 10:30:19 2024 From: pgcon6 at msn.com (Peter Constable) Date: Wed, 6 Mar 2024 16:30:19 +0000 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com> Message-ID: There is a further misconception here: glyph outlines are defined as Bezier splines, which get represented as a sequence of (x,y) Cartesian coordinates for Bezier control points. However, some glyph IDs can describe the outline by reference to other glyph IDs - these are referred to as composite glyph descriptions. A composite glyph description can apply a 2x2 matrix for an affine transformation. ("Affine" means that parallel lines remain parallel.) The 2x2 matrix can be used to specify a rotation. So, while the outline of a hieroglyph might involve a large number of control points, to have a separate glyph that is a rotation of that hieroglyph requires only a small amount of additional data. Of course, if there were 10,000 rotational variants, that would add a lot of data. But the statement from that document didn't suggest a large number of rotational variants. It makes a generic statement, "Rotations entail...", which suggests one or more additional variants. Since it is commenting on L2/21-248 which proposed three rotational variants (possibly extended in the future to 7), it has 3 in mind, not 10,000. Adding 3 rotational variants using composite glyph descriptions would be a fairly small increase in data. (I'm commenting here only on the OpenType font format, not on the encoding of rotational variants of hieroglyphs.) Peter -----Original Message----- From: Unicode On Behalf Of James Kass via Unicode Sent: Tuesday, March 5, 2024 1:46 PM To: unicode at corp.unicode.org Subject: Re: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Regardless of whether fine glyph rotation gets handled at the plain-text or rich-text level, one of the arguments against fine rotation appears to be based on a misconception related to fonts and rendering. Here's one example of this misconception from the document: https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf "Rotations entail a stark increase in the size of OpenType fonts, as such fonts cannot dynamically rotate glyphs, and therefore need to store rotated copies. To avoid the blow-up in size that would result if all rotations for all signs were included, a selection of rotations for a selection of signs is registered, and fonts would only be expected to implement those." Glyph data in TrueType/Open Type fonts is stored as a series of Cartesian points. Rotation by any valid degree is accomplished by formula. The font engine is the logical place for implementing any transformative formula, for example - font size. Thus, any glyph from any font could be rotated by any degree dynamically with no impact on the font file size. From jameskass at code2001.com Wed Mar 6 10:31:58 2024 From: jameskass at code2001.com (James Kass) Date: Wed, 6 Mar 2024 16:31:58 +0000 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: <741cb38.9a01.18e13ccd4d1.Webtop.119@btinternet.com> References: <741cb38.9a01.18e13ccd4d1.Webtop.119@btinternet.com> Message-ID: <519a82b7-6ad5-49ee-9c65-c93df7cb0898@code2001.com> Here?s an idea.? It?s called mark-up, but most people currently spell it as ?markup?. ? ? As proof of concept, this is already working here!? But, alas, only for a limited subset of Unicode characters.? I tend to think of my suggestion in this thread of being like putting the cart before the horse.? But if I had not put my suggestion forward, nobody would have considered fine glyph rotation to be impossible, because any list member here could have conjured up something just as elegant in less than a minute. Seriously, but also in the department of ?nobody asked?, here?s how to rotate glyphs by any angle: x1 = x0cos(?) - y0sin(?)? (Equation 1 calculating the new x co-ordinate) y1 = x0sin(?) + y0cos(?)? (Equation 2 calculating the new y co-ordinate) where ? (theta)? ?? degrees of desired rotation x0,y0? ?? the original x and y co-ordinates x1,y1? ?? the target x and y co-ordinates (after rotation) cos(?),sin(?)? ?? the cosine and sine of theta Equation 1 in plain English: x1 (the new x coordinate) equals the old x coordinate times the cosine of the desired rotation angle ?minus the old y coordinate times the sine of the desired rotation angle. Of course, the glyph has now likely shifted out of its ?boundary box? and will need to be repositioned appropriately.? The lowest values of any glyph?s x and y co-ordinates are stored in the font?s glyph data.? The lowest values of x and y in the rotated glyph would need to be determined programatically.? Then get the deltas between the original and rotated x and y minimums.? Apply the x delta to all x co-ordinates and the y delta to all y co-ordinates in the rotated glyph, and presto! The above is for ?simple? glyphs only.? For ?composite? glyphs (or to accomodate anything done by OpenType features such as glyph positioning), the font engine would establish an appropriate series of Cartesian points and then perform the equations on that new data. From wjgo_10009 at btinternet.com Wed Mar 6 11:33:53 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 6 Mar 2024 17:33:53 +0000 (GMT) Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Message-ID: <54633ddb.a389.18e14d42d88.Webtop.119@btinternet.com> James Kass wrote as follows: ?? ?? > Seriously, but also in the department of ?nobody asked?, here?s how to > rotate glyphs by any angle: ?? ?? The virtual machine would do all of that processing behind the scenes for each point in the glyph once it received an angle and a Gr command. I note that the formula quoted rotates the mathematical way, namely counterclockwise for a positive theta. ?? ?? > Of course, the glyph has now likely shifted out of its ?boundary box? > and will need to be repositioned appropriately. ? ? Possibly. Yet this need not necessarily be a problem because the Gs command could have been used to scale the glyph before the rotation and the Gr command might be defined to rotate about the centre of the bounding box of the glyph, given that the glyph has been validated during fontmaking as having no outlying off-curve points. ?? ?? William Overington ? ? Wednesday 6 March 2024 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Wed Mar 6 13:07:47 2024 From: jameskass at code2001.com (James Kass) Date: Wed, 6 Mar 2024 19:07:47 +0000 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com> <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com> Message-ID: On 2024-03-06 4:30 PM, Peter Constable via Unicode wrote: > There is a further misconception here: glyph outlines are defined as > Bezier splines, which get represented as a sequence of (x,y) Cartesian > coordinates for Bezier control points. However, some glyph IDs can > describe the outline by reference to other glyph IDs - these are > referred to as composite glyph descriptions. A composite glyph > description can apply a 2x2 matrix for an affine transformation. > ("Affine" means that parallel lines remain parallel.) The 2x2 matrix can > be used to specify a rotation. So, while the outline of a hieroglyph > might involve a large number of control points, to have a separate glyph > that is a rotation of that hieroglyph requires only a small amount of > additional data. > > Of course, if there were 10,000 rotational variants, that would add a > lot of data. But the statement from that document didn't suggest a large > number of rotational variants. It makes a generic statement, "Rotations > entail...", which suggests one or more additional variants. Since it is > commenting on L2/21-248 which proposed three rotational variants > (possibly extended in the future to 7), it has 3 in mind, not 10,000. > Adding 3 rotational variants using composite glyph descriptions would be > a fairly small increase in data. > > > (I'm commenting here only on the OpenType font format, not on the > encoding of rotational variants of hieroglyphs.) Exactly.? I'm commenting here only on fine rotation for any glyph because the Egyptologists apparently neither want nor need such granularity.? But the concept of handling any glyph was brought up by the OP.? If anybody wanted such a feature, it would already have been accomplished.? And it would most likely have been relegated to rich-text. Adding even only three rotational variants to a large font, like a CJK font, would be non-trivial.? Adding the gamut of possible degrees to a font would be absurd.? So any fine rotation support would of necessity be handled by the font engine instead of the font file in this scenario. My apologies for jumping into this rabbit hole; I should have exercised a little self-restraint. From pgcon6 at msn.com Thu Mar 7 10:48:57 2024 From: pgcon6 at msn.com (Peter Constable) Date: Thu, 7 Mar 2024 16:48:57 +0000 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: <54633ddb.a389.18e14d42d88.Webtop.119@btinternet.com> References: <54633ddb.a389.18e14d42d88.Webtop.119@btinternet.com> Message-ID: William, it seems to want to reinvent SVG but limited to text elements and without the XML apparatus. While perhaps interesting as a thought experiment, I don?t think you?ll get much interest unless you can provide compelling reasons why yet another format is needed. (The OpenType COLRv1 table format was designed to supersede the SVG table and had, in my opinion, some pretty compelling reasons. But there were still some who said it wasn?t needed.) In any case, what you?re discussing is a higher-level protocol than Unicode. _Unicode_ will not be delivering this any time soon. Peter From: Unicode On Behalf Of William_J_G Overington via Unicode Sent: Wednesday, March 6, 2024 10:34 AM To: unicode at corp.unicode.org Subject: Re: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? James Kass wrote as follows: > Seriously, but also in the department of ?nobody asked?, here?s how to rotate glyphs by any angle: The virtual machine would do all of that processing behind the scenes for each point in the glyph once it received an angle and a Gr command. I note that the formula quoted rotates the mathematical way, namely counterclockwise for a positive theta. > Of course, the glyph has now likely shifted out of its ?boundary box? and will need to be repositioned appropriately. Possibly. Yet this need not necessarily be a problem because the Gs command could have been used to scale the glyph before the rotation and the Gr command might be defined to rotate about the centre of the bounding box of the glyph, given that the glyph has been validated during fontmaking as having no outlying off-curve points. William Overington Wednesday 6 March 2024 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Thu Mar 7 12:56:09 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 7 Mar 2024 18:56:09 +0000 (GMT) Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Message-ID: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com> I know hardly anything about the structure of SVG and XML so I cannot comment on any perceived similarities to what I have written. ? ? Peter Constable wrote: ? ? > While perhaps interesting as a thought experiment, I don?t think > you?ll get much interest unless you can provide compelling reasons why > yet another format is needed. ?? ? What happened was that I was looking throuh the UTC Current Document Register and I had a look at the document mentioned in the first post in this thread. ? ? https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf ? ? ? I saw the statement in the conclusions "and suggest a level of palaeographic detail that Unicode cannot deliver." ?? ? It occurred to me that Unicode could possibly deliver that level of palaeographic detail if a subset of the system that I had written about in a chapter of my first novel, with the addition of a Gr command for Glyph rotation, were implemented using Unicode tag characters. (I later added a Gm comand for Glyph mirroring horizontally.) ? ? So I wrote about my opinion and posted in this mailing list. Some of us are having a discussion and maybe some other people are also reading the thread. ?? ? I would be pleased if people are interested and maybe, just maybe, what I have written may some day help in the encoding of palaeographic detail in Unicode, but if there is no interest, then there we are. I have no expertise in Egyptology. My research interests are in other topics. ?? ?? > In any case, what you?re discussing is a higher-level protocol than > Unicode.? ? ? I am thinking here of tag characters being used in a Unicode plain text environment to add capability to what can be encoded in Unicode. Tag characters are used to encode The Welsh Flag. ?? ? ? > _Unicode_ will not be delivering this any time soon. ? ? I have suggested an additional way to encode palaeographic detail in a Unicode plain text environment. The method is available free to use if people want to apply it. ? ? William Overington ?? ? Thursday 7 March 2024 ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Mar 7 14:30:04 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 7 Mar 2024 12:30:04 -0800 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com> References: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com> Message-ID: On 3/7/2024 10:56 AM, William_J_G Overington via Unicode wrote: > I know hardly anything about the structure of SVG and XML so I cannot > comment on any perceived similarities to what I have written. Which means that you are also not informed of "prior art". Which is not usually a good start for any proposal. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eik at iki.fi Thu Mar 7 15:19:32 2024 From: eik at iki.fi (eik at iki.fi) Date: Thu, 7 Mar 2024 23:19:32 +0200 Subject: VS: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: References: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com> Message-ID: <010f01da70d5$268a1720$739e4560$@iki.fi> I strongly suggest that this thread should be closed, which ? for good reasons -would not be the first time for the initiator of this one. Erkki I. Kolehmainen L?hett?j?: Unicode Puolesta Asmus Freytag via Unicode L?hetetty: torstai 7. maaliskuuta 2024 22.30 Vastaanottaja: unicode at corp.unicode.org Aihe: Re: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? On 3/7/2024 10:56 AM, William_J_G Overington via Unicode wrote: I know hardly anything about the structure of SVG and XML so I cannot comment on any perceived similarities to what I have written. Which means that you are also not informed of "prior art". Which is not usually a good start for any proposal. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From root at corp.unicode.org Thu Mar 7 15:31:11 2024 From: root at corp.unicode.org (root at corp.unicode.org) Date: Thu, 07 Mar 2024 15:31:11 -0600 Subject: VS: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? Message-ID: <65ea321f.qeO84VUizcKRc/H3%root@corp.unicode.org> Hello, Erkki wrote: > I strongly suggest that this thread should be closed, > which ? for good reasons -would not be the first time > for the initiator of this one. Let us therefore please consider this thread closed. Warm regards, From arthur at reutenauer.eu Thu Mar 7 15:40:31 2024 From: arthur at reutenauer.eu (Arthur Rosendahl) Date: Thu, 7 Mar 2024 22:40:31 +0100 Subject: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs? In-Reply-To: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com> References: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com> Message-ID: On Thu, Mar 07, 2024 at 06:56:09PM +0000, William_J_G Overington via Unicode wrote: > What happened was that I was looking throuh the UTC Current Document > Register and I had a look at the document mentioned in the first post in > this thread. > ? > https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf > ? ? > I saw the statement in the conclusions "and suggest a level of palaeographic > detail that Unicode cannot deliver." Did you read the rest of the document? It is very clear that the author is not suggesting that Unicode should support that level of palaeographic detail. Quite the contrary: he argues against the introduction of additional variation selectors to represent rotations. That stance is fairly obvious by reading only the paragraph from which you extracted the quote with which you started this thread. For the benefit of the list, the context is 7 Conclusions The use of rotations in Unicode deserves reconsideration. Some of the rotations that have already been added to StandardizedVariants.txt undermine common assumptions about Unicode and Unicode fonts, one of which is that the validity of an encoding does not depend on the choice of font. New rotations that have recently been proposed also have the potential to misrepresent what may be mere inaccuracies in modern handwritten transcriptions, and suggest a level of palaeographic detail that Unicode cannot deliver. The author then mitigates this conclusion somewhat in the next paragraph, but overall the intent is clear: there is no need to represent arbitrary rotations of Ancient Egyptian hieroglyphs in Unicode. Arthur From yuri.sukhov at gmail.com Mon Mar 11 19:06:07 2024 From: yuri.sukhov at gmail.com (Yuri Sukhov) Date: Tue, 12 Mar 2024 04:06:07 +0400 Subject: Identifier caseless matching without toNFKC_Casefold Message-ID: Hi, I'm implementing a caseless matching for strings used as identifiers. I'm aware that NFKC_Casefold mapping and related toNFKC_Casefold() string transform are designed for such scenario. Unfortunately, the language and libraries I'm using do not implement toNFKC_Casefold(), so I'm looking for an alternative approach. My use case does not seem to require the removal of default-ignorables, for now I'm only concerned with the case and compatibility variations. It looks like the definition of the compatibility caseless match is what I need: A string X is a compatibility caseless match for a string Y if and only if: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = NFKD(toCasefold(NFKD(toCasefold(NFD(Y)))) However, I can't seem to find the case where that extra cycle of folding/normalization makes the difference. It seems to me that the same result - compatibility caseless match - can be achieved with a simpler approach: NFC(toCasefold(NFKD(X))) Basically, I think about it as 1) removing the compatibility variations by normalizing with decomposition, 2) then removing the case differences from this decomposed sequence, 3) and finally storing a folded string in a potentially shorter NFC form. It looks like it checks all the boxes, and my - likely naive - testing shows that NFC(toCasefold(NFKD(X))) = NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) I'm sure I'm missing something, and would appreciate an explanation why/when this won't work. Yuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From addisoni18n at gmail.com Tue Mar 12 08:44:46 2024 From: addisoni18n at gmail.com (Addison Phillips) Date: Tue, 12 Mar 2024 06:44:46 -0700 Subject: Identifier caseless matching without toNFKC_Casefold In-Reply-To: References: Message-ID: <001f01da7483$72ecf2c0$58c6d840$@gmail.com> Hi Yuri, The part of the W3C ?Character Model? called ?String Matching for the Web? illustrates the case in this section: https://www.w3.org/TR/charmod-norm/#normalizationAndCasefold You might find the rest of the document useful in your work as well. Best regards, Addison Addison Phillips Chair (W3C Internationalization WG) Internationalization is not a feature. It is an architecture. From: Unicode On Behalf Of Yuri Sukhov via Unicode Sent: Monday, March 11, 2024 5:06 PM To: unicode at corp.unicode.org Subject: Identifier caseless matching without toNFKC_Casefold Hi, I'm implementing a caseless matching for strings used as identifiers. I'm aware that NFKC_Casefold mapping and related toNFKC_Casefold() string transform are designed for such scenario. Unfortunately, the language and libraries I'm using do not implement toNFKC_Casefold(), so I'm looking for an alternative approach. My use case does not seem to require the removal of default-ignorables, for now I'm only concerned with the case and compatibility variations. It looks like the definition of the compatibility caseless match is what I need: A string X is a compatibility caseless match for a string Y if and only if: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = NFKD(toCasefold(NFKD(toCasefold(NFD(Y)))) However, I can't seem to find the case where that extra cycle of folding/normalization makes the difference. It seems to me that the same result - compatibility caseless match - can be achieved with a simpler approach: NFC(toCasefold(NFKD(X))) Basically, I think about it as 1) removing the compatibility variations by normalizing with decomposition, 2) then removing the case differences from this decomposed sequence, 3) and finally storing a folded string in a potentially shorter NFC form. It looks like it checks all the boxes, and my - likely naive - testing shows that NFC(toCasefold(NFKD(X))) = NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) I'm sure I'm missing something, and would appreciate an explanation why/when this won't work. Yuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From julesbertholet at quoi.xyz Tue Mar 12 11:57:19 2024 From: julesbertholet at quoi.xyz (Jules Bertholet) Date: Tue, 12 Mar 2024 16:57:19 +0000 (UTC) Subject: UAX 11 and new Unicode 16 variation sequences Message-ID: <83V8AS.L3FMFT6ZAUHA@quoi.xyz> Unicode 16 will add 8 new standardized variation sequences: https://unicode.org/alloc/Pipeline.html#variation_sequences, https://www.unicode.org/L2/L2023/23212r-quotes-svs-proposal.pdf These new sequences have the peculiarity that they affect the width of the character they modify. In previous versions of Unicode, this was only true of emoji variation sequences. However, UAX 11 "East Asian Width" (https://www.unicode.org/reports/tr11/) currently does not account for the possibility of a non-emoji variation sequence affecting character width, and no update to UAX 11 has yet been proposed for Unicode 16. Is there a plan to address this gap? Jules Bertholet From yuri.sukhov at gmail.com Tue Mar 12 15:08:18 2024 From: yuri.sukhov at gmail.com (Yuri Sukhov) Date: Wed, 13 Mar 2024 00:08:18 +0400 Subject: Identifier caseless matching without toNFKC_Casefold In-Reply-To: <001f01da7483$72ecf2c0$58c6d840$@gmail.com> References: <001f01da7483$72ecf2c0$58c6d840$@gmail.com> Message-ID: Hi Addison, Thank you for the link, the examples were very useful. And the more I look at them, I become increasingly convinced that the compatibility caseless match transform NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) is unnecessary excessive. The second (outer) casefold+normalization cycle can be avoided if we perform the initial NFKD normalization *before* the first casefold. Doing compatibility decomposition before the casefolding eliminates the problem with the U+3392 character illustrated in example 19. And since it's recommended to decompose before the initial casefold anyway (the Greek ypogegrammeni/iota issue), NFKD normalization as the first step also covers that case. As a result, the transform is reduced from 3 normalizations + 2 casefolds to 2 normalizations and 1 casefold: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) --> NFC(toCasefold(NFKD(X))) Am I missing any non-trivial cases? For the examples on the mentioned page, as well as for the other "problematic" cases I've seen in other places, this lighter transform produces the same output as the more expensive one from the standard. Kind regards, Yuri On Tue, Mar 12, 2024 at 5:44?PM Addison Phillips wrote: > Hi Yuri, > > > > The part of the W3C ?Character Model? called ?String Matching for the Web? > illustrates the case in this section: > > > > https://www.w3.org/TR/charmod-norm/#normalizationAndCasefold > > > > You might find the rest of the document useful in your work as well. > > > > Best regards, > > > > Addison > > > > Addison Phillips > > Chair (W3C Internationalization WG) > > > > Internationalization is not a feature. > > It is an architecture. > > > > > > > > *From:* Unicode *On Behalf Of *Yuri > Sukhov via Unicode > *Sent:* Monday, March 11, 2024 5:06 PM > *To:* unicode at corp.unicode.org > *Subject:* Identifier caseless matching without toNFKC_Casefold > > > > Hi, > > > > I'm implementing a caseless matching for strings used as identifiers. I'm > aware that NFKC_Casefold mapping and related toNFKC_Casefold() string > transform are designed for such scenario. Unfortunately, the language and > libraries I'm using do not implement toNFKC_Casefold(), so I'm looking for > an alternative approach. > > > > My use case does not seem to require the removal of default-ignorables, > for now I'm only concerned with the case and compatibility variations. It > looks like the definition of the compatibility caseless match is what I > need: > > > > A string X is a compatibility caseless match for a string Y if and only > if: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = > NFKD(toCasefold(NFKD(toCasefold(NFD(Y)))) > > > > However, I can't seem to find the case where that extra cycle of > folding/normalization makes the difference. It seems to me that the same > result - compatibility caseless match - can be achieved with a simpler > approach: > > > > NFC(toCasefold(NFKD(X))) > > > > Basically, I think about it as 1) removing the compatibility variations by > normalizing with decomposition, 2) then removing the case differences from > this decomposed sequence, 3) and finally storing a folded string in a > potentially shorter NFC form. > > > > It looks like it checks all the boxes, and my - likely naive - testing > shows that > > > > NFC(toCasefold(NFKD(X))) = NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) > > > > I'm sure I'm missing something, and would appreciate an explanation > why/when this won't work. > > > Yuri > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hujialun at comp.nus.edu.sg Wed Mar 13 12:29:04 2024 From: hujialun at comp.nus.edu.sg (Hu Jialun) Date: Thu, 14 Mar 2024 01:29:04 +0800 Subject: Correct way to encode mixed width text in Unicode? Message-ID: <4fae262b5e5469c6bef51fe80d45eb6d@comp.nus.edu.sg> From what I read [^1], the fullwidth glyphs in Unicode are provided solely for backward compatibility and lossless roundtrip with legacy standards such as Shift-JIS. The rationale [^2] seems to be that Unicode views it as a presentational issue that is better dealt with by the renderer based on linguistic context, and use of such characters is generally discouraged. In some cases, no compatibility character is provided at all, such as fullwidth left/right single/double quotation marks, because no legacy encoding contains both full- and half-width forms, and Unicode explicitly states the rejection of any more of such. Unicode recommends in the same document, Ambiguous quotation marks are generally resolved to wide when they enclose and are adjacent to a wide character, and to narrow otherwise. However, there are cases where the width gets tricky to resolve, which sometimes yields incorrect results across current fonts and renderer implementations, ??????????????????N?????Nostalgia????? ?Make a wish! Make a wish!???????? The term ?char kway teow? is a transliteration of the Chinese characters ?????. ????????Hamlet??????Polonius (II.ii.) ?Though this be madness, yet there is method in?t.??? ??????????????????????????????????????? ?????????????????????? It seems that the recommended algorithm fails in such cases (rendered inconsistently e.g. with fullwidth left quote and halfwidth right quote), and such cases may just be too complex for an algorithm to render without intricate and fragile rulesets for the language itself. This issue mainly affects Simplified Chinese but not other East Asian languages, due to the fact that Traditional Chinese, Japanese and vertically written Korean commonly use the U+300C-300F CORNER BRACKET family (East_Asian_Width=Wide). My question is thus, is there a common way to provide a hint in plaintext for the width of an ambiguous width character, maybe as a Unicode variation selector or something like RLM? [^1]: https://harjit.moe/hwfwblame.html [^2]: https://www.unicode.org/reports/tr11/tr11-41.html#Relation Originally asked at: ~hujialun From ecm.unicode at gmail.com Wed Mar 13 16:07:12 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Wed, 13 Mar 2024 17:07:12 -0400 Subject: Correct way to encode mixed width text in Unicode? In-Reply-To: <4fae262b5e5469c6bef51fe80d45eb6d@comp.nus.edu.sg> References: <4fae262b5e5469c6bef51fe80d45eb6d@comp.nus.edu.sg> Message-ID: It?s planned. See . On Wed, Mar 13, 2024 at 1:50?PM Hu Jialun via Unicode < unicode at corp.unicode.org> wrote: > From what I read [^1], the fullwidth glyphs in Unicode are provided > solely for backward compatibility and lossless roundtrip with legacy > standards such as Shift-JIS. The rationale [^2] seems to be that Unicode > views it as a presentational issue that is better dealt with by the > renderer based on linguistic context, and use of such characters is > generally discouraged. In some cases, no compatibility character is > provided at all, such as fullwidth left/right single/double quotation > marks, because no legacy encoding contains both full- and half-width > forms, and Unicode explicitly states the rejection of any more of such. > > Unicode recommends in the same document, > > Ambiguous quotation marks are generally resolved to wide when they > enclose and are adjacent to a wide character, and to narrow > otherwise. > > However, there are cases where the width gets tricky to resolve, which > sometimes yields incorrect results across current fonts and renderer > implementations, > > ??????????????????N?????Nostalgia????? > > ?Make a wish! Make a wish!???????? > > The term ?char kway teow? is a transliteration of the Chinese > characters ?????. > > ????????Hamlet??????Polonius (II.ii.) ?Though this be > madness, yet there is method in?t.??? > > ??????????????????????????????????????? > ?????????????????????? > > It seems that the recommended algorithm fails in such cases (rendered > inconsistently e.g. with fullwidth left quote and halfwidth right > quote), and such cases may just be too complex for an algorithm to > render without intricate and fragile rulesets for the language itself. > > This issue mainly affects Simplified Chinese but not other East Asian > languages, due to the fact that Traditional Chinese, Japanese and > vertically written Korean commonly use the U+300C-300F CORNER BRACKET > family (East_Asian_Width=Wide). > > My question is thus, is there a common way to provide a hint in > plaintext for the width of an ambiguous width character, maybe as a > Unicode variation selector or something like RLM? > > [^1]: https://harjit.moe/hwfwblame.html > [^2]: https://www.unicode.org/reports/tr11/tr11-41.html#Relation > Originally asked at: > < > https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode > > > > ~hujialun > -------------- next part -------------- An HTML attachment was scrubbed... URL: