Should U+3248 ... U+324F be wide characters?
Mike FABIAN via Unicode
unicode at unicode.org
Fri Aug 18 06:22:48 CDT 2017
"Asmus Freytag (c) via Unicode" <unicode at unicode.org> さんはかきました:
> On 8/17/2017 7:24 AM, Mike FABIAN wrote:
>> Asmus Freytag via Unicode <unicode at unicode.org> さんはかきました:
>>> On 8/16/2017 6:26 AM, Mike FABIAN via Unicode wrote:
>>> EastAsianWidth.txt contains:
>>> 3248..324F;A # No  CIRCLED NUMBER TEN ON BLACK
>>> SQUARE..CIRCLED NUMBER EIGHTY ON BLACK SQUARE
>>> i.e. it classifies the width of the characters at
>>> between 3248 and 324F as ambiguous.
>>> Is this really correct? Shouldn’t they be “W”, i.e. wide?
>>> In most fonts these characters seem to be square shaped
>>> wide characters.
>>> "W" not only implies display width, but also a different treatment in the context of line
>>> breaking and vertical layout of text.
>>> "W" characters behave more like Ideographs, for the most part, while "N" are treated as
>>> forming words (for the most part).
>> Most emoji now have "W", for example:
>> 1F600..1F64F;W # So  GRINNING FACE..PERSON WITH FOLDED HANDS
>> That seems correct because emoji behave more like Ideographs.
>> Isn’t this the same for “CIRCLED NUMBER TEN ON BLACK SQUARE”?
>> This seems to me also more like an Ideograph.
>>> "A" means, you get to decide whether to treat these as "W" or "N" based on context. If
>>> used in a non ideographic context, they behave like all other symbols (but happen to fill
>>> an EM square).
> "A" means, you get to decide whether to treat these as "W" or "N" based on context.
> There's really not strong need to change an "A" towards "W", because
> "A" doesn't get in your way if you decided that "W" works better for
> Remember that all the EAW properties ares supposed to be "resolved"
> down to W or N. For some, like Na that resolution is deterministic,
> for A it is context/application dependent, but when you finally
> process your data, only W(ide) or N(arrow) remain after resolution.
OK, that means that is OK to decide that in the context of glibc
resolving these to W(ide) is best, right?
Mike FABIAN <mfabian at redhat.com>
More information about the Unicode