<div dir="ltr"><div dir="ltr">Itʼs planned. See <<a href="https://www.unicode.org/L2/L2023/23231.htm#177-C36">https://www.unicode.org/L2/L2023/23231.htm#177-C36</a>>.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 13, 2024 at 1:50 PM Hu Jialun via Unicode <<a href="mailto:unicode@corp.unicode.org">unicode@corp.unicode.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> From what I read [^1], the fullwidth glyphs in Unicode are provided<br>
solely for backward compatibility and lossless roundtrip with legacy<br>
standards such as Shift-JIS. The rationale [^2] seems to be that Unicode<br>
views it as a presentational issue that is better dealt with by the<br>
renderer based on linguistic context, and use of such characters is<br>
generally discouraged. In some cases, no compatibility character is<br>
provided at all, such as fullwidth left/right single/double quotation<br>
marks, because no legacy encoding contains both full- and half-width<br>
forms, and Unicode explicitly states the rejection of any more of such.<br>
<br>
Unicode recommends in the same document,<br>
<br>
Ambiguous quotation marks are generally resolved to wide when they<br>
enclose and are adjacent to a wide character, and to narrow<br>
otherwise.<br>
<br>
However, there are cases where the width gets tricky to resolve, which<br>
sometimes yields incorrect results across current fonts and renderer<br>
implementations,<br>
<br>
他们一致认为,目前最大的敌人无疑是“N问题”,即Nostalgia,思乡病。<br>
<br>
“Make a wish! Make a wish!”琳琳和盼盼喊。<br>
<br>
The term “char kway teow” is a transliteration of the Chinese<br>
characters “炒粿條”.<br>
<br>
教授昨天讲了:“Hamlet的原文其实是Polonius (II.ii.) ‘Though this be<br>
madness, yet there is method in‘t.’“。<br>
<br>
在大韩民国,这个语言的名称是“한국어/韓國語”。在中国大陆、香港、澳门的名称是<br>
“韩语”或“朝鲜语”。台湾则通称为“韩语”。<br>
<br>
It seems that the recommended algorithm fails in such cases (rendered<br>
inconsistently e.g. with fullwidth left quote and halfwidth right<br>
quote), and such cases may just be too complex for an algorithm to<br>
render without intricate and fragile rulesets for the language itself.<br>
<br>
This issue mainly affects Simplified Chinese but not other East Asian<br>
languages, due to the fact that Traditional Chinese, Japanese and<br>
vertically written Korean commonly use the U+300C-300F CORNER BRACKET<br>
family (East_Asian_Width=Wide).<br>
<br>
My question is thus, is there a common way to provide a hint in<br>
plaintext for the width of an ambiguous width character, maybe as a<br>
Unicode variation selector or something like RLM?<br>
<br>
[^1]: <a href="https://harjit.moe/hwfwblame.html" rel="noreferrer" target="_blank">https://harjit.moe/hwfwblame.html</a><br>
[^2]: <a href="https://www.unicode.org/reports/tr11/tr11-41.html#Relation" rel="noreferrer" target="_blank">https://www.unicode.org/reports/tr11/tr11-41.html#Relation</a><br>
Originally asked at:<br>
<<a href="https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode" rel="noreferrer" target="_blank">https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode</a>><br>
<br>
~hujialun<br>
</blockquote></div></div>