<div dir="ltr"><div dir="ltr">Itʼs planned.  See <<a href="https://www.unicode.org/L2/L2023/23231.htm#177-C36">https://www.unicode.org/L2/L2023/23231.htm#177-C36</a>>.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 13, 2024 at 1:50 PM Hu Jialun via Unicode <<a href="mailto:unicode@corp.unicode.org">unicode@corp.unicode.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> From what I read [^1], the fullwidth glyphs in Unicode are provided<br>

solely for backward compatibility and lossless roundtrip with legacy<br>

standards such as Shift-JIS. The rationale [^2] seems to be that Unicode<br>

views it as a presentational issue that is better dealt with by the<br>

renderer based on linguistic context, and use of such characters is<br>

generally discouraged. In some cases, no compatibility character is<br>

provided at all, such as fullwidth left/right single/double quotation<br>

marks, because no legacy encoding contains both full- and half-width<br>

forms, and Unicode explicitly states the rejection of any more of such.<br>

<br>

Unicode recommends in the same document,<br>

<br>

     Ambiguous quotation marks are generally resolved to wide when they<br>

     enclose and are adjacent to a wide character, and to narrow<br>

     otherwise.<br>

<br>

However, there are cases where the width gets tricky to resolve, which<br>

sometimes yields incorrect results across current fonts and renderer<br>

implementations,<br>

<br>

     他们一致认为，目前最大的敌人无疑是“N问题”，即Nostalgia，思乡病。<br>

<br>

     “Make a wish! Make a wish!”琳琳和盼盼喊。<br>

<br>

     The term “char kway teow” is a transliteration of the Chinese<br>

     characters “炒粿條”.<br>

<br>

     教授昨天讲了：“Hamlet的原文其实是Polonius (II.ii.) ‘Though this be<br>

     madness, yet there is method in‘t.’“。<br>

<br>

     在大韩民国，这个语言的名称是“한국어／韓國語”。在中国大陆、香港、澳门的名称是<br>

     “韩语”或“朝鲜语”。台湾则通称为“韩语”。<br>

<br>

It seems that the recommended algorithm fails in such cases (rendered<br>

inconsistently e.g. with fullwidth left quote and halfwidth right<br>

quote), and such cases may just be too complex for an algorithm to<br>

render without intricate and fragile rulesets for the language itself.<br>

<br>

This issue mainly affects Simplified Chinese but not other East Asian<br>

languages, due to the fact that Traditional Chinese, Japanese and<br>

vertically written Korean commonly use the U+300C-300F CORNER BRACKET<br>

family (East_Asian_Width=Wide).<br>

<br>

My question is thus, is there a common way to provide a hint in<br>

plaintext for the width of an ambiguous width character, maybe as a<br>

Unicode variation selector or something like RLM?<br>

<br>

[^1]: <a href="https://harjit.moe/hwfwblame.html" rel="noreferrer" target="_blank">https://harjit.moe/hwfwblame.html</a><br>

[^2]: <a href="https://www.unicode.org/reports/tr11/tr11-41.html#Relation" rel="noreferrer" target="_blank">https://www.unicode.org/reports/tr11/tr11-41.html#Relation</a><br>

Originally asked at:<br>

<<a href="https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode" rel="noreferrer" target="_blank">https://superuser.com/questions/1828050/correct-way-to-encode-mixed-width-text-in-unicode</a>><br>

<br>

~hujialun<br>

</blockquote></div></div>