Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
kojiishi at gmail.com
Fri May 1 07:33:49 CDT 2015
I support Makoto for the change. Nobody should appreciate that behavior, either worked around locally (Firefox, IE) or unnoticed (Chrome). Rather than implementing yet another work around in Chrome, I wish it being fixed finally after 15 years.
If this issue is like 5 people say break and 5 not to, or considering the long life of the bug, 9 say break and 1 say not to, I understand that Ken’s answer might make more sense. However, I’m quite sure that this is a 10-0 issue. Everyone using UAX#14 has to choose from trailer, unnoticed, or won’t fix. I think that kind of things should better be fixed.
Half-width CJK should follow the same line breaking class as their wide counterparts. From that point of view, half-width Hangul being AL is actually correct. (Note that this is not the same as full-width oftentimes having the different classes than their narrow counterparts.)
Half-width punctuations already have correct classes, so they’re fine. Symbols in U+FFE8-FFEE are AL, which looks also incorrect, but I do not find these code points in any CJK legacy encoding. Where had they come from? Logical thinking is to assign the same classes as their wide counterparts, but I can’t be sure without knowing where they came from.
Ken, does this change cause problems in terms of the stability policy?
> On Apr 29, 2015, at 10:22, Ken Whistler <kenwhistler at att.net> wrote:
> Taking this thread back to the original question...
> The Line_Break property values for halfwidth katakana (lb=AL)
> and regular katakana (lb=ID) have been stable since they
> were first defined for Unicode 3.0 -- 15 years ago.
> Regardless of whether lb=AL is the optimal assignment for
> the halfwidth katakana, it seems likely to me that trying to
> *change* that Line_Break assignment, just for halfwidth
> katakana, at this late date, would likely be more destabilizing
> for existing implementations, rather than helpful.
> The citations below show *different* behavior between browsers
> for linebreaking around halfwidth katakana. That suggests that
> Firefox and IE11 have already provided tailoring to better match
> expectations. The correct avenue forward, it seems to me, would
> be to pursue bugs against browsers that do not show expected
> behavior, to see if improvements there are feasible, rather than
> to modify the base Line_Break property values that everybody has
> to tailor *from*.
> Note that this is not *just* a Japanese problem nor a matter
> of not matching JIS X 4051. UAX #14 is *not* a direct implementation
> of JIS X 4051 rules, although it is certainly informed by them and
> has many Line_Break values introduced to get default behavior closer to
> the Japanese rules for linebreaking. And the compatibility halfwidth
> characters in the standard also include halfwidth jamo and symbols,
> so any changes also would need to be considered in the context
> of consistency for those and for *Korean* rules, as well as for Japanese.
> On 4/27/2015 10:57 PM, Makoto Kato wrote:
>> Hi, Suzuki-san. Thank you for reply.
>>> At present, I have no objection to add halfwidth katakana
>>> to ideographic-class in UAX#14, but I'm unfamiliar with the
>>> (negative) impact caused by the lack of halfwidth katakana
>>> in it. Could you tell me if you know anything?
>> Since half-width katakana isn't ID, it isn't break line like
>> full-wdith katakana.
>> Firefox and IE11 define half-width katakana as ID. The line break of
>> half-width katakana is same as full-width katakana.
>> Chrome doesn't define it as ID. Half-width katakana isn't line break
>> per character.
>> Although I read JIS X 4051, it doesn't define that half-width katakana
>> and full-width katakana are differently.
>>> I guess, the inclusion or exclusion in other classes, like,
>>> AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
>>> the appropriate line breaking, but the inclusion or exclusion
>>> in ID-class does not seem to be important. If the inclusion
>>> in ID-class is important, more characters (e.g. Bopomofo)
>>> should be considered for full coverage. How do you think of?
>> My discussion is why half-width katanaka character isn't same class of
>> full-width katakana character. In this case, half-width katakana
>> originally defines as AL at current spec. So when moving to ID, break
>> rule is strongly changed. (non-break -> break before or after).
>> -- Makoto
More information about the Unicode