From wjgo_10009 at btinternet.com  Mon Mar  4 07:22:32 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 4 Mar 2024 13:22:32 +0000 (GMT)
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
Message-ID: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>


I have no expertise in Egyptology, I am however interested in Unicode 
encoding research.
?
I have been reading.
?
https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf 
<https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf>
?
I opine that Unicode could possibly deliver that level of palaeographic 
detail if a custom virtual machine is defined and then implemented in 
the rendering system.
?
The rotation, and any other movements and scaling, being encoded in a 
sequence of software-like commands, expressed using tag characters, 
being included in the plain text sequence.
?
Software-like yet no loops, jumps or calls, so more like a list of 
hand-entered commands to a calculator.
?
The glyphs that are manipulated would be obtained from the font. The 
obeying of the software-like sequences would be by a virtual machine in 
the rendering application.
?
Once implemented, an end user would be able to specify rotation of a 
glyph by, say, 20 degrees, using a tag character sequence of something 
like
?
20Gr
?
after the Unicode code point of the character.
?
The two character tag sequence Gr being the command to the virtual 
machine to rotate the glyph by the number of degrees previouly stated.
?
Scaling by 25% then rotating by 20 degrees by a tag sequence something 
like
?
25Gs20Gr
?
after the Unicode code point of the character.
?
The specification would need to state about which point the glyph is 
scaled and about which point the glyph is rotated.
?
Commands such as Gh and Gv for horizontal and vertical movements 
respectively, with the specification stating how to specify the 
distance: for example, percentage of the width of a unit square. There 
could be a Gi command to encode movement in and out if so desired.
?
Both positive numbers and negative numbers could be used for rotations 
and movements, so rotations could be both clockwise and 
counterclockwise, movements could be right, left, up, down, in, out.
?
Other commands could be added as required by experts who have knowledge 
about the hieroglyphs.
?
A hieroglyph made up of various glyphs, (each of which could be scaled, 
rotated, located) could be specified by a tag sequence between
?U+E007B TAG LEFT CURLY BRACKET
and
U+E007D TAG RIGHT CURLY BRACKET.
?
?
William Overington
?
Monday 4 March 2024
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240304/d0245d62/attachment.htm>

From asmusf at ix.netcom.com  Mon Mar  4 14:44:08 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Mon, 4 Mar 2024 12:44:08 -0800
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>
References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>
Message-ID: <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com>

What you are describing is rich text. Anytime you add "special 
commands", no matter how you encode them, you have rich text. (There is 
a small amount of gray zone, in which characters like SHY, NBSP and TAB 
can be understood as still being "plain text", but a syntax for a 
virtual rotation machine is definitely beyond the scope).

A./


On 3/4/2024 5:22 AM, William_J_G Overington via Unicode wrote:
>
> I have no expertise in Egyptology, I am however interested in Unicode 
> encoding research.
>
> I have been reading.
>
> https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf 
> <https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf>
>
> I opine that Unicode could possibly deliver that level of 
> palaeographic detail if a custom virtual machine is defined and then 
> implemented in the rendering system.
>
> The rotation, and any other movements and scaling, being encoded in a 
> sequence of software-like commands, expressed using tag characters, 
> being included in the plain text sequence.
>
> Software-like yet no loops, jumps or calls, so more like a list of 
> hand-entered commands to a calculator.
>
> The glyphs that are manipulated would be obtained from the font. The 
> obeying of the software-like sequences would be by a virtual machine 
> in the rendering application.
>
> Once implemented, an end user would be able to specify rotation of a 
> glyph by, say, 20 degrees, using a tag character sequence of something 
> like
>
> 20Gr
>
> after the Unicode code point of the character.
>
> The two character tag sequence Gr being the command to the virtual 
> machine to rotate the glyph by the number of degrees previouly stated.
>
> Scaling by 25% then rotating by 20 degrees by a tag sequence something 
> like
>
> 25Gs20Gr
>
> after the Unicode code point of the character.
>
> The specification would need to state about which point the glyph is 
> scaled and about which point the glyph is rotated.
>
> Commands such as Gh and Gv for horizontal and vertical movements 
> respectively, with the specification stating how to specify the 
> distance: for example, percentage of the width of a unit square. There 
> could be a Gi command to encode movement in and out if so desired.
>
> Both positive numbers and negative numbers could be used for rotations 
> and movements, so rotations could be both clockwise and 
> counterclockwise, movements could be right, left, up, down, in, out.
>
> Other commands could be added as required by experts who have 
> knowledge about the hieroglyphs.
>
> A hieroglyph made up of various glyphs, (each of which could be 
> scaled, rotated, located) could be specified by a tag sequence between
>
> ?U+E007B TAG LEFT CURLY BRACKET
>
> and
>
> U+E007D TAG RIGHT CURLY BRACKET.
>
> William Overington
>
> Monday 4 March 2024
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240304/2c7c951b/attachment.htm>

From wjgo_10009 at btinternet.com  Mon Mar  4 16:13:25 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 4 Mar 2024 22:13:25 +0000 (GMT)
Subject: Could Unicode deliver the level of paleographic detail needed
 for encoding ancient Egyptian hieroglyphs?
Message-ID: <300bb5cd.72c5.18e0b8762df.Webtop.119@btinternet.com>


Asmus Freytag wrote as follows:
> What you are describing is rich text. Anytime you add "special 
> commands", no matter how you encode them, you have rich text. (There 
> is a small amount of gray zone, in which characters like SHY, NBSP and 
> TAB can be understood as still being "plain text", but a syntax for a 
> virtual rotation machine is definitely beyond the scope).
Thank you for replying.
Is what I suggest any less plain text than is the tag sequence specified 
for the Welsh flag? If so, why?
At present, if I understand it correctly, special commands have been 
added into plain text in the form of extra characters for formatting 
hieroglyphs.
I suggest that using this software-like approach allows far greater 
possibilities than can reasonable be produced by adding an extra 
character into Unicode for each possibility.
This software-like approach means that any rotation angle can be 
specified.
I opine that if the scope of plain text in a definition from long ago 
needs to be changed so as to allow progress into the future then that 
needs to be done.
I have since making my earlier post realized that a command Gm needs to 
be added so that a glyph can be horizontally mirrored if so desired.
Other commands and features can be added straightforwardly to this 
software-like system.?
I opine that whether the method that I have suggested goes forward and 
becomes part of Unicode should depend on whether it is regarded by the 
Egyptologists who would use it as a method that they consider as being 
of benefit.
William Overington
Monday 4 March 2024
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240304/01d789cd/attachment.htm>

From jameskass at code2001.com  Tue Mar  5 14:45:54 2024
From: jameskass at code2001.com (James Kass)
Date: Tue, 5 Mar 2024 20:45:54 +0000
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com>
References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>
 <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com>
Message-ID: <c5a69f24-354d-4062-878c-0c6917ad3c49@code2001.com>


On 2024-03-04 8:44 PM, Asmus Freytag via Unicode wrote:
> What you are describing is rich text. Anytime you add "special 
> commands", no matter how you encode them, you have rich text. (There 
> is a small amount of gray zone, in which characters like SHY, NBSP and 
> TAB can be understood as still being "plain text", but a syntax for a 
> virtual rotation machine is definitely beyond the scope).
>
> A./

Egyptologists were (and are) involved in the process to encode 
hieroglyphics.? Any shortcomings in the encoding should be addressed by 
those experts.? People involved in the decisions understand the needs of 
the user community as well as the Unicode principals for character 
encoding.? The document makes a convincing argument against enabling a 
plethora of rotations willy-nilly at the plain-text level.

Plain-text legibility is the issue for Unicode.? Whether there is a 
semantic difference between a glyph rotated by 48 degrees and one 
rotated by 49 degrees is not for me to say, but it seems unlikely. It is 
up to the experts to determine whether a string of hieroglyphic 
characters is legible.? So it would be the better practice to first 
determine how the user community is handling any issue before devising 
any mark-up scheme just because somebody might consider it useful some day.

Regardless of whether fine glyph rotation gets handled at the plain-text 
or rich-text level, one of the arguments against fine rotation appears 
to be based on a misconception related to fonts and rendering.

Here's one example of this misconception from the document:
https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf

"Rotations entail a stark increase in the size of OpenType fonts, as 
such fonts cannot dynamically rotate glyphs, and therefore need to store 
rotated copies. To avoid the blow-up in size that would result if all 
rotations for all signs were included, a selection of rotations for a 
selection of signs is registered, and fonts would only be expected to 
implement those."

Glyph data in TrueType/Open Type fonts is stored as a series of 
Cartesian points.? Rotation by any valid degree is accomplished by 
formula.? The font engine is the logical place for implementing any 
transformative formula, for example - font size.? Thus, any glyph from 
any font could be rotated by any degree dynamically with no impact on 
the font file size.


From asmusf at ix.netcom.com  Tue Mar  5 16:19:32 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Tue, 5 Mar 2024 14:19:32 -0800
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <c5a69f24-354d-4062-878c-0c6917ad3c49@code2001.com>
References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>
 <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com>
 <c5a69f24-354d-4062-878c-0c6917ad3c49@code2001.com>
Message-ID: <ba7b7735-5a00-47f5-9d3f-036dfcb0371c@ix.netcom.com>

On 3/5/2024 12:45 PM, James Kass via Unicode wrote:
> Here's one example of this misconception from the document:
> https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf

The document makes a number of very cogent arguments that together imply 
that adding a layout language subset for arbitrary rotations into plain 
text is misguided based on the writing system, without even getting into 
the plain text/rich text discussion.

There are some rich text environments where it is possible to achieve a 
control over the placement and orientation of glyphs that is rather 
unrestricted. Those are the correct choice when it comes to faithfully 
representing individual examples of actual pale0graphic texts in all 
their details, accidental or intentional, regular or irregular.

We can all agree that duplicating such capabilities in plain text isn't 
desirable.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240305/fdf93819/attachment.htm>

From wjgo_10009 at btinternet.com  Wed Mar  6 06:46:14 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Wed, 6 Mar 2024 12:46:14 +0000 (GMT)
Subject: Could Unicode deliver the level of paleographic detail needed
 for encoding ancient Egyptian hieroglyphs?
Message-ID: <741cb38.9a01.18e13ccd4d1.Webtop.119@btinternet.com>


James Kass wrote:
?
?
> ?So it would be the better practice to first determine how the user 
> community is handling any issue before devising any mark-up scheme 
> just because somebody might consider it useful some day.
?
?
Well, I tend to think of my suggestion in this thread of being like how 
a development in pure mathematics is often before an application for 
that development is found.
?
?
So, yes, I am putting this suggestion forward because I consider, indeed 
hope, that it might be useful at some future time. At present it is just 
a suggestion in a thread in the Unicode public mailing list. Maybe it 
will never be more than that. Yet I have put the idea forward and people 
are welcome to apply it in practice if they so choose. If I had not put 
it forward then encoding the level of paleographic detail in a Unicode 
encoding that this system could achieve might have been considered 
impossible.?
?
?
Actually, what I am suggesting in this thread is a small subset, with 
the addition of Gr and Gm to that small subset, of a theoretical 
software system using tag characters that I devised some years ago and 
which is featured in a chapter (Chapter 9 in case anyone is interested) 
of my first novel, which was written from time to time from June 2016 to 
February 2019. The commands Gs, Gh, Gv and G+ are all mentioned in that 
chapter.
??
??
The full list of data types in that chapter is as follows.
??
I Integer
?
D Double precision floating point
?
B Boolean
?
H Character
?
S String
?
Z Complex
?
Q Quaternion
?
P Point, as in a font, a point is a triple of two Integers and a Boolean
?
C Contour, a sequence of P items
?
G Glyph, a sequence of C items
?
and L is used for commands for the Link flag.
?
?
?
If applied then there may need to be a few additions so as to have 
colour font capability.
?
?
The virtual machine can have its own temporary storage for each data 
type, so it will be possible to use commands such as tag characters Gp 
and Gg to put into and get from respectively of a memory structure in 
the virtual machine.
?
?
For example,
?
?
3Gp would put a copy of the contents of the ag register into mg[3] in 
the virtual machine.
?
??
mg[3]:=ag;
?
??
It all goes back to some research that I carried out in the year 2000.
?
??
http://www.users.globalnet.co.uk/~ngo/14560000.htm
?
?
I have been influenced by the FORTH computer language, which was 
originally devised to control the motors of a telescope in an 
observatory.
?
??
William Overington
??
??
Wednesday 6 March 2024
??
?
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240306/5d10b887/attachment.htm>

From pgcon6 at msn.com  Wed Mar  6 10:30:19 2024
From: pgcon6 at msn.com (Peter Constable)
Date: Wed, 6 Mar 2024 16:30:19 +0000
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <c5a69f24-354d-4062-878c-0c6917ad3c49@code2001.com>
References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>
 <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com>
 <c5a69f24-354d-4062-878c-0c6917ad3c49@code2001.com>
Message-ID: <DS0PR12MB7535DB5DB438C2800F2C17C786212@DS0PR12MB7535.namprd12.prod.outlook.com>

There is a further misconception here: glyph outlines are defined as Bezier splines, which get represented as a sequence of (x,y) Cartesian coordinates for Bezier control points. However, some glyph IDs can describe the outline by reference to other glyph IDs - these are referred to as composite glyph descriptions. A composite glyph description can apply a 2x2 matrix for an affine transformation. ("Affine" means that parallel lines remain parallel.) The 2x2 matrix can be used to specify a rotation. So, while the outline of a hieroglyph might involve a large number of control points, to have a separate glyph that is a rotation of that hieroglyph requires only a small amount of additional data.

Of course, if there were 10,000 rotational variants, that would add a lot of data. But the statement from that document didn't suggest a large number of rotational variants. It makes a generic statement, "Rotations entail...", which suggests one or more additional variants. Since it is commenting on L2/21-248 which proposed three rotational variants (possibly extended in the future to 7), it has 3 in mind, not 10,000. Adding 3 rotational variants using composite glyph descriptions would be a fairly small increase in data.


(I'm commenting here only on the OpenType font format, not on the encoding of rotational variants of hieroglyphs.)


Peter


-----Original Message-----
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of James Kass via Unicode
Sent: Tuesday, March 5, 2024 1:46 PM
To: unicode at corp.unicode.org
Subject: Re: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs?

<snip>

Regardless of whether fine glyph rotation gets handled at the plain-text or rich-text level, one of the arguments against fine rotation appears to be based on a misconception related to fonts and rendering.

Here's one example of this misconception from the document:
https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf

"Rotations entail a stark increase in the size of OpenType fonts, as such fonts cannot dynamically rotate glyphs, and therefore need to store rotated copies. To avoid the blow-up in size that would result if all rotations for all signs were included, a selection of rotations for a selection of signs is registered, and fonts would only be expected to implement those."

Glyph data in TrueType/Open Type fonts is stored as a series of Cartesian points.  Rotation by any valid degree is accomplished by formula.  The font engine is the logical place for implementing any transformative formula, for example - font size.  Thus, any glyph from any font could be rotated by any degree dynamically with no impact on the font file size.


From jameskass at code2001.com  Wed Mar  6 10:31:58 2024
From: jameskass at code2001.com (James Kass)
Date: Wed, 6 Mar 2024 16:31:58 +0000
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <741cb38.9a01.18e13ccd4d1.Webtop.119@btinternet.com>
References: <741cb38.9a01.18e13ccd4d1.Webtop.119@btinternet.com>
Message-ID: <519a82b7-6ad5-49ee-9c65-c93df7cb0898@code2001.com>


Here?s an idea.? It?s called mark-up, but most people currently spell it 
as ?markup?.

<rotation=90>?</rotation>
<rotation=47>?</rotation>

As proof of concept, this is already working here!? But, alas, only for 
a limited subset of Unicode characters.? I tend to think of my 
suggestion in this thread of being like putting the cart before the 
horse.? But if I had not put my suggestion forward, nobody would have 
considered fine glyph rotation to be impossible, because any list member 
here could have conjured up something just as elegant in less than a minute.

Seriously, but also in the department of ?nobody asked?, here?s how to 
rotate glyphs by any angle:

x1 = x0cos(?) - y0sin(?)? (Equation 1 calculating the new x co-ordinate)
y1 = x0sin(?) + y0cos(?)? (Equation 2 calculating the new y co-ordinate)

where
? (theta)? ?? degrees of desired rotation
x0,y0? ?? the original x and y co-ordinates
x1,y1? ?? the target x and y co-ordinates (after rotation)
cos(?),sin(?)? ?? the cosine and sine of theta

Equation 1 in plain English:
x1 (the new x coordinate) equals
the old x coordinate times the cosine of the desired rotation angle
 ?minus
the old y coordinate times the sine of the desired rotation angle.

Of course, the glyph has now likely shifted out of its ?boundary box? 
and will need to be repositioned appropriately.? The lowest values of 
any glyph?s x and y co-ordinates are stored in the font?s glyph data.? 
The lowest values of x and y in the rotated glyph would need to be 
determined programatically.? Then get the deltas between the original 
and rotated x and y minimums.? Apply the x delta to all x co-ordinates 
and the y delta to all y co-ordinates in the rotated glyph, and presto!

The above is for ?simple? glyphs only.? For ?composite? glyphs (or to 
accomodate anything done by OpenType features such as glyph 
positioning), the font engine would establish an appropriate series of 
Cartesian points and then perform the equations on that new data.


From wjgo_10009 at btinternet.com  Wed Mar  6 11:33:53 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Wed, 6 Mar 2024 17:33:53 +0000 (GMT)
Subject: Could Unicode deliver the level of paleographic detail needed
 for encoding ancient Egyptian hieroglyphs?
Message-ID: <54633ddb.a389.18e14d42d88.Webtop.119@btinternet.com>


James Kass wrote as follows:
??
??
> Seriously, but also in the department of ?nobody asked?, here?s how to 
> rotate glyphs by any angle:
??
??
The virtual machine would do all of that processing behind the scenes 
for each point in the glyph once it received an angle and a Gr command. 
I note that the formula quoted rotates the mathematical way, namely 
counterclockwise for a positive theta.
??
??
> Of course, the glyph has now likely shifted out of its ?boundary box? 
> and will need to be repositioned appropriately.
?
?
Possibly. Yet this need not necessarily be a problem because the Gs 
command could have been used to scale the glyph before the rotation and 
the Gr command might be defined to rotate about the centre of the 
bounding box of the glyph, given that the glyph has been validated 
during fontmaking as having no outlying off-curve points.
??
??
William Overington
?
?
Wednesday 6 March 2024
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240306/b95c1209/attachment.htm>

From jameskass at code2001.com  Wed Mar  6 13:07:47 2024
From: jameskass at code2001.com (James Kass)
Date: Wed, 6 Mar 2024 19:07:47 +0000
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <DS0PR12MB7535DB5DB438C2800F2C17C786212@DS0PR12MB7535.namprd12.prod.outlook.com>
References: <70349490.62d6.18e09a157f3.Webtop.119@btinternet.com>
 <0ebfbd08-f3fa-40ac-a74a-2a77134fac13@ix.netcom.com>
 <c5a69f24-354d-4062-878c-0c6917ad3c49@code2001.com>
 <DS0PR12MB7535DB5DB438C2800F2C17C786212@DS0PR12MB7535.namprd12.prod.outlook.com>
Message-ID: <d05faf94-4bc7-4ffe-9389-caa276d2b4dc@code2001.com>


On 2024-03-06 4:30 PM, Peter Constable via Unicode wrote:
> There is a further misconception here: glyph outlines are defined as
> Bezier splines, which get represented as a sequence of (x,y) Cartesian
> coordinates for Bezier control points. However, some glyph IDs can
> describe the outline by reference to other glyph IDs - these are
> referred to as composite glyph descriptions. A composite glyph
> description can apply a 2x2 matrix for an affine transformation.
> ("Affine" means that parallel lines remain parallel.) The 2x2 matrix can
>   be used to specify a rotation. So, while the outline of a hieroglyph
> might involve a large number of control points, to have a separate glyph
>   that is a rotation of that hieroglyph requires only a small amount of
> additional data.
>
> Of course, if there were 10,000 rotational variants, that would add a
> lot of data. But the statement from that document didn't suggest a large
>   number of rotational variants. It makes a generic statement, "Rotations
>   entail...", which suggests one or more additional variants. Since it is
>   commenting on L2/21-248 which proposed three rotational variants
> (possibly extended in the future to 7), it has 3 in mind, not 10,000.
> Adding 3 rotational variants using composite glyph descriptions would be
>   a fairly small increase in data.
>
>
> (I'm commenting here only on the OpenType font format, not on the
> encoding of rotational variants of hieroglyphs.)
Exactly.? I'm commenting here only on fine rotation for any glyph 
because the Egyptologists apparently neither want nor need such 
granularity.? But the concept of handling any glyph was brought up by 
the OP.? If anybody wanted such a feature, it would already have been 
accomplished.? And it would most likely have been relegated to rich-text.

Adding even only three rotational variants to a large font, like a CJK 
font, would be non-trivial.? Adding the gamut of possible degrees to a 
font would be absurd.? So any fine rotation support would of necessity 
be handled by the font engine instead of the font file in this scenario.

My apologies for jumping into this rabbit hole; I should have exercised 
a little self-restraint.


From pgcon6 at msn.com  Thu Mar  7 10:48:57 2024
From: pgcon6 at msn.com (Peter Constable)
Date: Thu, 7 Mar 2024 16:48:57 +0000
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <54633ddb.a389.18e14d42d88.Webtop.119@btinternet.com>
References: <54633ddb.a389.18e14d42d88.Webtop.119@btinternet.com>
Message-ID: <DS0PR12MB7535726AE55FA0C3CBE58BF386202@DS0PR12MB7535.namprd12.prod.outlook.com>

William, it seems to want to reinvent SVG but limited to text elements and without the XML apparatus. While perhaps interesting as a thought experiment, I don?t think you?ll get much interest unless you can provide compelling reasons why yet another format is needed.

(The OpenType COLRv1 table format was designed to supersede the SVG table and had, in my opinion, some pretty compelling reasons.  But there were still some who said it wasn?t needed.)

In any case, what you?re discussing is a higher-level protocol than Unicode. _Unicode_ will not be delivering this any time soon.


Peter

From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of William_J_G Overington via Unicode
Sent: Wednesday, March 6, 2024 10:34 AM
To: unicode at corp.unicode.org
Subject: Re: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs?


James Kass wrote as follows:


> Seriously, but also in the department of ?nobody asked?, here?s how to rotate glyphs by any angle:


The virtual machine would do all of that processing behind the scenes for each point in the glyph once it received an angle and a Gr command. I note that the formula quoted rotates the mathematical way, namely counterclockwise for a positive theta.


> Of course, the glyph has now likely shifted out of its ?boundary box? and will need to be repositioned appropriately.


Possibly. Yet this need not necessarily be a problem because the Gs command could have been used to scale the glyph before the rotation and the Gr command might be defined to rotate about the centre of the bounding box of the glyph, given that the glyph has been validated during fontmaking as having no outlying off-curve points.


William Overington


Wednesday 6 March 2024

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240307/afab3aad/attachment.htm>

From wjgo_10009 at btinternet.com  Thu Mar  7 12:56:09 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Thu, 7 Mar 2024 18:56:09 +0000 (GMT)
Subject: Could Unicode deliver the level of paleographic detail needed
 for encoding ancient Egyptian hieroglyphs?
Message-ID: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com>


I know hardly anything about the structure of SVG and XML so I cannot 
comment on any perceived similarities to what I have written.
?
?
Peter Constable wrote:
?
?
> While perhaps interesting as a thought experiment, I don?t think 
> you?ll get much interest unless you can provide compelling reasons why 
> yet another format is needed.
??
?
What happened was that I was looking throuh the UTC Current Document 
Register and I had a look at the document mentioned in the first post in 
this thread.
?
?
https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf
? ?
?
I saw the statement in the conclusions "and suggest a level of 
palaeographic detail that Unicode cannot deliver."
??
?
It occurred to me that Unicode could possibly deliver that level of 
palaeographic detail if a subset of the system that I had written about 
in a chapter of my first novel, with the addition of a Gr command for 
Glyph rotation, were implemented using Unicode tag characters. (I later 
added a Gm comand for Glyph mirroring horizontally.)
?
?
So I wrote about my opinion and posted in this mailing list. Some of us 
are having a discussion and maybe some other people are also reading the 
thread.
??
?
I would be pleased if people are interested and maybe, just maybe, what 
I have written may some day help in the encoding of palaeographic detail 
in Unicode, but if there is no interest, then there we are. I have no 
expertise in Egyptology. My research interests are in other topics.
??
??
> In any case, what you?re discussing is a higher-level protocol than 
> Unicode.?
?
?
I am thinking here of tag characters being used in a Unicode plain text 
environment to add capability to what can be encoded in Unicode. Tag 
characters are used to encode The Welsh Flag.
??
? ?
> _Unicode_ will not be delivering this any time soon.
?
?
I have suggested an additional way to encode palaeographic detail in a 
Unicode plain text environment. The method is available free to use if 
people want to apply it.
?
?
William Overington
??
?
Thursday 7 March 2024
?
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240307/42ae2ad1/attachment.htm>

From asmusf at ix.netcom.com  Thu Mar  7 14:30:04 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Thu, 7 Mar 2024 12:30:04 -0800
Subject: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com>
References: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com>
Message-ID: <bb0b4391-9d1e-4aa3-abd2-7811eaefa01b@ix.netcom.com>

On 3/7/2024 10:56 AM, William_J_G Overington via Unicode wrote:
> I know hardly anything about the structure of SVG and XML so I cannot 
> comment on any perceived similarities to what I have written.

Which means that you are also not informed of "prior art". Which is not 
usually a good start for any proposal.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240307/48662f9a/attachment.htm>

From eik at iki.fi  Thu Mar  7 15:19:32 2024
From: eik at iki.fi (eik at iki.fi)
Date: Thu, 7 Mar 2024 23:19:32 +0200
Subject: VS: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
In-Reply-To: <bb0b4391-9d1e-4aa3-abd2-7811eaefa01b@ix.netcom.com>
References: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com>
 <bb0b4391-9d1e-4aa3-abd2-7811eaefa01b@ix.netcom.com>
Message-ID: <010f01da70d5$268a1720$739e4560$@iki.fi>

I strongly suggest that this thread should be closed, which ? for good reasons -would not be the first time for the initiator of this one. 

 
Erkki I. Kolehmainen

 
L?hett?j?: Unicode <unicode-bounces at corp.unicode.org> Puolesta Asmus Freytag via Unicode
L?hetetty: torstai 7. maaliskuuta 2024 22.30
Vastaanottaja: unicode at corp.unicode.org
Aihe: Re: Could Unicode deliver the level of paleographic detail needed for encoding ancient Egyptian hieroglyphs?

 
On 3/7/2024 10:56 AM, William_J_G Overington via Unicode wrote:

I know hardly anything about the structure of SVG and XML so I cannot comment on any perceived similarities to what I have written.

Which means that you are also not informed of "prior art". Which is not usually a good start for any proposal.

A./

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240307/e8f4249f/attachment.htm>

From root at corp.unicode.org  Thu Mar  7 15:31:11 2024
From: root at corp.unicode.org (root at corp.unicode.org)
Date: Thu, 07 Mar 2024 15:31:11 -0600
Subject: VS: Could Unicode deliver the level of paleographic detail needed for
 encoding ancient Egyptian hieroglyphs?
Message-ID: <65ea321f.qeO84VUizcKRc/H3%root@corp.unicode.org>

Hello,

Erkki wrote:

> I strongly suggest that this thread should be closed,
> which ? for good reasons -would not be the first time
> for the initiator of this one.

Let us therefore please consider this thread closed.

Warm regards,


From arthur at reutenauer.eu  Thu Mar  7 15:40:31 2024
From: arthur at reutenauer.eu (Arthur Rosendahl)
Date: Thu, 7 Mar 2024 22:40:31 +0100
Subject: Could Unicode deliver the level of paleographic detail needed
 for encoding ancient Egyptian hieroglyphs?
In-Reply-To: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com>
References: <1dc42336.c086.18e1a45da38.Webtop.119@btinternet.com>
Message-ID: <Zeo0TwQTYVi71bif@phare.normalesup.org>

On Thu, Mar 07, 2024 at 06:56:09PM +0000, William_J_G Overington via Unicode wrote:
> What happened was that I was looking throuh the UTC Current Document
> Register and I had a look at the document mentioned in the first post in
> this thread.
> ?
> https://www.unicode.org/L2/L2024/24045-ancient-egyptian-rotations.pdf
> ? ?
> I saw the statement in the conclusions "and suggest a level of palaeographic
> detail that Unicode cannot deliver."

  Did you read the rest of the document?  It is very clear that the
author is not suggesting that Unicode should support that level of
palaeographic detail.  Quite the contrary: he argues against the
introduction of additional variation selectors to represent rotations.
That stance is fairly obvious by reading only the paragraph from which
you extracted the quote with which you started this thread.  For the
benefit of the list, the context is

	7 Conclusions

	The use of rotations in Unicode deserves reconsideration.  Some
	of the rotations that have already been added to
	StandardizedVariants.txt undermine common assumptions about
	Unicode and Unicode fonts, one of which is that the validity of
	an encoding does not depend on the choice of font.  New
	rotations that have recently been proposed also have the
	potential to misrepresent what may be mere inaccuracies in
	modern handwritten transcriptions, and suggest a level of
	palaeographic detail that Unicode cannot deliver.

  The author then mitigates this conclusion somewhat in the next
paragraph, but overall the intent is clear: there is no need to
represent arbitrary rotations of Ancient Egyptian hieroglyphs in
Unicode.


	Arthur

From yuri.sukhov at gmail.com  Mon Mar 11 19:06:07 2024
From: yuri.sukhov at gmail.com (Yuri Sukhov)
Date: Tue, 12 Mar 2024 04:06:07 +0400
Subject: Identifier caseless matching without toNFKC_Casefold
Message-ID: <CADbd57doUgr5jtXkunWp5n=y1zh9NA9KUu4D8oQZiKnmt=igHA@mail.gmail.com>

Hi,

I'm implementing a caseless matching for strings used as identifiers. I'm
aware that NFKC_Casefold mapping and related toNFKC_Casefold() string
transform are designed for such scenario. Unfortunately, the language and
libraries I'm using do not implement toNFKC_Casefold(), so I'm looking for
an alternative approach.

My use case does not seem to require the removal of default-ignorables, for
now I'm only concerned with the case and compatibility variations. It looks
like the definition of the compatibility caseless match is what I need:

A string X is a compatibility caseless match for a string Y if and only if:
NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) =
NFKD(toCasefold(NFKD(toCasefold(NFD(Y))))

However, I can't seem to find the case where that extra cycle of
folding/normalization makes the difference. It seems to me that the same
result - compatibility caseless match - can be achieved with a simpler
approach:

NFC(toCasefold(NFKD(X)))

Basically, I think about it as 1) removing the compatibility variations by
normalizing with decomposition, 2) then removing the case differences from
this decomposed sequence, 3) and finally storing a folded string in a
potentially shorter NFC form.

It looks like it checks all the boxes, and my - likely naive - testing
shows that

NFC(toCasefold(NFKD(X))) = NFKD(toCasefold(NFKD(toCasefold(NFD(X)))))

I'm sure I'm missing something, and would appreciate an explanation
why/when this won't work.

Yuri
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240312/35216cfc/attachment-0001.htm>

From addisoni18n at gmail.com  Tue Mar 12 08:44:46 2024
From: addisoni18n at gmail.com (Addison Phillips)
Date: Tue, 12 Mar 2024 06:44:46 -0700
Subject: Identifier caseless matching without toNFKC_Casefold
In-Reply-To: <CADbd57doUgr5jtXkunWp5n=y1zh9NA9KUu4D8oQZiKnmt=igHA@mail.gmail.com>
References: <CADbd57doUgr5jtXkunWp5n=y1zh9NA9KUu4D8oQZiKnmt=igHA@mail.gmail.com>
Message-ID: <001f01da7483$72ecf2c0$58c6d840$@gmail.com>

Hi Yuri,

 
The part of the W3C ?Character Model? called ?String Matching for the Web? illustrates the case in this section:

 
https://www.w3.org/TR/charmod-norm/#normalizationAndCasefold

 
You might find the rest of the document useful in your work as well.

 
Best regards,

 
Addison

 
Addison Phillips

Chair (W3C Internationalization WG)

 
Internationalization is not a feature.

It is an architecture.

 
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of Yuri Sukhov via Unicode
Sent: Monday, March 11, 2024 5:06 PM
To: unicode at corp.unicode.org
Subject: Identifier caseless matching without toNFKC_Casefold

 
Hi,

 
I'm implementing a caseless matching for strings used as identifiers. I'm aware that NFKC_Casefold mapping and related toNFKC_Casefold() string transform are designed for such scenario. Unfortunately, the language and libraries I'm using do not implement toNFKC_Casefold(), so I'm looking for an alternative approach.

 
My use case does not seem to require the removal of default-ignorables, for now I'm only concerned with the case and compatibility variations. It looks like the definition of the compatibility caseless match is what I need:

 
A string X is a compatibility caseless match for a string Y if and only if: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = NFKD(toCasefold(NFKD(toCasefold(NFD(Y))))

 
However, I can't seem to find the case where that extra cycle of folding/normalization makes the difference. It seems to me that the same result - compatibility caseless match - can be achieved with a simpler approach:

 
NFC(toCasefold(NFKD(X))) 

 
Basically, I think about it as 1) removing the compatibility variations by normalizing with decomposition, 2) then removing the case differences from this decomposed sequence, 3) and finally storing a folded string in a potentially shorter NFC form.

 
It looks like it checks all the boxes, and my - likely naive - testing shows that

 
NFC(toCasefold(NFKD(X))) = NFKD(toCasefold(NFKD(toCasefold(NFD(X)))))

 
I'm sure I'm missing something, and would appreciate an explanation why/when this won't work.


Yuri

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240312/ea920bce/attachment.htm>

From julesbertholet at quoi.xyz  Tue Mar 12 11:57:19 2024
From: julesbertholet at quoi.xyz (Jules Bertholet)
Date: Tue, 12 Mar 2024 16:57:19 +0000 (UTC)
Subject: UAX 11 and new Unicode 16 variation sequences
Message-ID: <83V8AS.L3FMFT6ZAUHA@quoi.xyz>

Unicode 16 will add 8 new standardized variation sequences: 
https://unicode.org/alloc/Pipeline.html#variation_sequences,
https://www.unicode.org/L2/L2023/23212r-quotes-svs-proposal.pdf

These new sequences have the peculiarity that they affect the width of 
the character they modify. In previous versions of Unicode, this was 
only true of emoji variation sequences. However, UAX 11 "East Asian 
Width" (https://www.unicode.org/reports/tr11/) currently does not 
account for the possibility of a non-emoji variation sequence affecting 
character width, and no update to UAX 11 has yet been proposed for 
Unicode 16. Is there a plan to address this gap?

Jules Bertholet


From yuri.sukhov at gmail.com  Tue Mar 12 15:08:18 2024
From: yuri.sukhov at gmail.com (Yuri Sukhov)
Date: Wed, 13 Mar 2024 00:08:18 +0400
Subject: Identifier caseless matching without toNFKC_Casefold
In-Reply-To: <001f01da7483$72ecf2c0$58c6d840$@gmail.com>
References: <CADbd57doUgr5jtXkunWp5n=y1zh9NA9KUu4D8oQZiKnmt=igHA@mail.gmail.com>
 <001f01da7483$72ecf2c0$58c6d840$@gmail.com>
Message-ID: <CADbd57eE1kTSkA-12UPzgUGHMS=OcbJPz6JwUQHyLWO5+zo5Nw@mail.gmail.com>

Hi Addison,

Thank you for the link, the examples were very useful. And the more I look
at them, I become increasingly convinced that the compatibility caseless
match transform

NFKD(toCasefold(NFKD(toCasefold(NFD(X)))))

is unnecessary excessive.

The second (outer) casefold+normalization cycle can be avoided if we
perform the initial NFKD normalization *before* the first casefold. Doing
compatibility decomposition before the casefolding eliminates the
problem with the U+3392 character illustrated in example 19. And since it's
recommended to decompose before the initial casefold anyway (the
Greek ypogegrammeni/iota issue), NFKD normalization as the first step also
covers that case.

As a result, the transform is reduced from 3 normalizations + 2 casefolds
to 2 normalizations and 1 casefold:

NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) --> NFC(toCasefold(NFKD(X)))

Am I missing any non-trivial cases? For the examples on the mentioned page,
as well as for the other "problematic" cases I've seen in other places,
this lighter transform produces the same output as the more expensive one
from the standard.
Kind regards,
Yuri


On Tue, Mar 12, 2024 at 5:44?PM Addison Phillips <addisoni18n at gmail.com>
wrote:

> Hi Yuri,
>
>
>
> The part of the W3C ?Character Model? called ?String Matching for the Web?
> illustrates the case in this section:
>
>
>
> https://www.w3.org/TR/charmod-norm/#normalizationAndCasefold
>
>
>
> You might find the rest of the document useful in your work as well.
>
>
>
> Best regards,
>
>
>
> Addison
>
>
>
> Addison Phillips
>
> Chair (W3C Internationalization WG)
>
>
>
> Internationalization is not a feature.
>
> It is an architecture.
>
>
>
>
>
>
>
> *From:* Unicode <unicode-bounces at corp.unicode.org> *On Behalf Of *Yuri
> Sukhov via Unicode
> *Sent:* Monday, March 11, 2024 5:06 PM
> *To:* unicode at corp.unicode.org
> *Subject:* Identifier caseless matching without toNFKC_Casefold
>
>
>
> Hi,
>
>
>
> I'm implementing a caseless matching for strings used as identifiers. I'm
> aware that NFKC_Casefold mapping and related toNFKC_Casefold() string
> transform are designed for such scenario. Unfortunately, the language and
> libraries I'm using do not implement toNFKC_Casefold(), so I'm looking for
> an alternative approach.
>
>
>
> My use case does not seem to require the removal of default-ignorables,
> for now I'm only concerned with the case and compatibility variations. It
> looks like the definition of the compatibility caseless match is what I
> need:
>
>
>
> A string X is a compatibility caseless match for a string Y if and only
> if: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) =
> NFKD(toCasefold(NFKD(toCasefold(NFD(Y))))
>
>
>
> However, I can't seem to find the case where that extra cycle of
> folding/normalization makes the difference. It seems to me that the same
> result - compatibility caseless match - can be achieved with a simpler
> approach:
>
>
>
> NFC(toCasefold(NFKD(X)))
>
>
>
> Basically, I think about it as 1) removing the compatibility variations by
> normalizing with decomposition, 2) then removing the case differences from
> this decomposed sequence, 3) and finally storing a folded string in a
> potentially shorter NFC form.
>
>
>
> It looks like it checks all the boxes, and my - likely naive - testing
> shows that
>
>
>
> NFC(toCasefold(NFKD(X))) = NFKD(toCasefold(NFKD(toCasefold(NFD(X)))))
>
>
>
> I'm sure I'm missing something, and would appreciate an explanation
> why/when this won't work.
>
>
> Yuri
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240313/98ae4a84/attachment.htm>

From hujialun at comp.nus.edu.sg  Wed Mar 13 12:29:04 2024
From: hujialun at comp.nus.edu.sg (Hu Jialun)
Date: Thu, 14 Mar 2024 01:29:04 +0800
Subject: Correct way to encode mixed width text in Unicode?
Message-ID: <4fae262b5e5469c6bef51fe80d45eb6d@comp.nus.edu.sg>

 From what I read [^1], the fullwidth glyphs in Unicode are provided
solely for backward compatibility and lossless roundtrip with legacy
standards such as Shift-JIS. The rationale [^2] seems to be that Unicode
views it as a presentational issue that is better dealt with by the
renderer based on linguistic context, and use of such characters is
generally discouraged. In some cases, no compatibility character is
provided at all, such as fullwidth left/right single/double quotation
marks, because no legacy encoding contains both full- and half-width
forms, and Unicode explicitly states the rejection of any more of such.

Unicode recommends in the same document,

     Ambiguous quotation marks are generally resolved to wide when they
     enclose and are adjacent to a wide character, and to narrow
     otherwise.

However, there are cases where the width gets tricky to resolve, which
sometimes yields incorrect results across current fonts and renderer
implementations,

     ??????????????????N?????Nostalgia?????

     ?Make a wish! Make a wish!????????

     The term ?char kway teow? is a transliteration of the Chinese
     characters ?????.

     ????????Hamlet??????Polonius (II.ii.) ?Though this be
     madness, yet there is method in?t.???

     ???????????????????????????????????????
     ??????????????????????

It seems that the recommended algorithm fails in such cases (rendered
inconsistently e.g. with fullwidth left quote and halfwidth right
quote), and such cases may just be too complex for an algorithm to
render without intricate and fragile rulesets for the language itself.

This issue mainly affects Simplified Chinese but not other East Asian
languages, due to the fact that Traditional Chinese, Japanese and
vertically written Korean commonly use the U+300C-300F CORNER BRACKET
family (East_Asian_Width=Wide).

My question is thus, is there a common way to provide a hint in
plaintext for the width of an ambiguous width character, maybe as a
Unicode variation selector or something like RLM?

[^1]: https://harjit.moe/hwfwblame.html
[^2]: https://www.unicode.org/reports/tr11/tr11-41.html#Relation
Originally asked at:
<https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode>

~hujialun

From ecm.unicode at gmail.com  Wed Mar 13 16:07:12 2024
From: ecm.unicode at gmail.com (Erik Carvalhal Miller)
Date: Wed, 13 Mar 2024 17:07:12 -0400
Subject: Correct way to encode mixed width text in Unicode?
In-Reply-To: <4fae262b5e5469c6bef51fe80d45eb6d@comp.nus.edu.sg>
References: <4fae262b5e5469c6bef51fe80d45eb6d@comp.nus.edu.sg>
Message-ID: <CAJTfRPG_oAGEJM9gxuyzqKNf25oVH=du6HD-RpRuPifMieDX6A@mail.gmail.com>

It?s planned.  See <https://www.unicode.org/L2/L2023/23231.htm#177-C36>.

On Wed, Mar 13, 2024 at 1:50?PM Hu Jialun via Unicode <
unicode at corp.unicode.org> wrote:

>  From what I read [^1], the fullwidth glyphs in Unicode are provided
> solely for backward compatibility and lossless roundtrip with legacy
> standards such as Shift-JIS. The rationale [^2] seems to be that Unicode
> views it as a presentational issue that is better dealt with by the
> renderer based on linguistic context, and use of such characters is
> generally discouraged. In some cases, no compatibility character is
> provided at all, such as fullwidth left/right single/double quotation
> marks, because no legacy encoding contains both full- and half-width
> forms, and Unicode explicitly states the rejection of any more of such.
>
> Unicode recommends in the same document,
>
>      Ambiguous quotation marks are generally resolved to wide when they
>      enclose and are adjacent to a wide character, and to narrow
>      otherwise.
>
> However, there are cases where the width gets tricky to resolve, which
> sometimes yields incorrect results across current fonts and renderer
> implementations,
>
>      ??????????????????N?????Nostalgia?????
>
>      ?Make a wish! Make a wish!????????
>
>      The term ?char kway teow? is a transliteration of the Chinese
>      characters ?????.
>
>      ????????Hamlet??????Polonius (II.ii.) ?Though this be
>      madness, yet there is method in?t.???
>
>      ???????????????????????????????????????
>      ??????????????????????
>
> It seems that the recommended algorithm fails in such cases (rendered
> inconsistently e.g. with fullwidth left quote and halfwidth right
> quote), and such cases may just be too complex for an algorithm to
> render without intricate and fragile rulesets for the language itself.
>
> This issue mainly affects Simplified Chinese but not other East Asian
> languages, due to the fact that Traditional Chinese, Japanese and
> vertically written Korean commonly use the U+300C-300F CORNER BRACKET
> family (East_Asian_Width=Wide).
>
> My question is thus, is there a common way to provide a hint in
> plaintext for the width of an ambiguous width character, maybe as a
> Unicode variation selector or something like RLM?
>
> [^1]: https://harjit.moe/hwfwblame.html
> [^2]: https://www.unicode.org/reports/tr11/tr11-41.html#Relation
> Originally asked at:
> <
> https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode
> >
>
> ~hujialun
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240313/a613e12c/attachment-0001.htm>