Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

Ken Whistler kenwhistler at att.net
Tue Apr 28 20:22:47 CDT 2015

Taking this thread back to the original question...

The Line_Break property values for halfwidth katakana (lb=AL)
and regular katakana (lb=ID) have been stable since they
were first defined for Unicode 3.0 -- 15 years ago.

Regardless of whether lb=AL is the optimal assignment for
the halfwidth katakana, it seems likely to me that trying to
*change* that Line_Break assignment, just for halfwidth
katakana, at this late date, would likely be more destabilizing
for existing implementations, rather than helpful.

The citations below show *different* behavior between browsers
for linebreaking around halfwidth katakana. That suggests that
Firefox and IE11 have already provided tailoring to better match
expectations. The correct avenue forward, it seems to me, would
be to pursue bugs against browsers that do not show expected
behavior, to see if improvements there are feasible, rather than
to modify the base Line_Break property values that everybody has
to tailor *from*.

Note that this is not *just* a Japanese problem nor a matter
of not matching JIS X 4051. UAX #14 is *not* a direct implementation
of JIS X 4051 rules, although it is certainly informed by them and
has many Line_Break values introduced to get default behavior closer to
the Japanese rules for linebreaking. And the compatibility halfwidth
characters in the standard also include halfwidth jamo and symbols,
so any changes also would need to be considered in the context
of consistency for those and for *Korean* rules, as well as for Japanese.


On 4/27/2015 10:57 PM, Makoto Kato wrote:
> Hi, Suzuki-san.  Thank you for reply.
>> At present, I have no objection to add halfwidth katakana
>> to ideographic-class in UAX#14, but I'm unfamiliar with the
>> (negative) impact caused by the lack of halfwidth katakana
>> in it. Could you tell me if you know anything?
> Since half-width katakana isn't ID, it isn't break line like
> full-wdith katakana.
> Firefox and IE11 define half-width katakana as ID.  The line break of
> half-width katakana is same as full-width katakana.
> Chrome doesn't define it as ID.  Half-width katakana isn't line break
> per character.
> Although I read JIS X 4051, it doesn't define that half-width katakana
> and full-width katakana are differently.
>> I guess, the inclusion or exclusion in other classes, like,
>> AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
>> the appropriate line breaking, but the inclusion or exclusion
>> in ID-class does not seem to be important. If the inclusion
>> in ID-class is important, more characters (e.g. Bopomofo)
>> should be considered for full coverage. How do you think of?
> My discussion is why half-width katanaka character isn't same class of
> full-width katakana character.  In this case, half-width katakana
> originally defines as AL at current spec.  So when moving to ID, break
> rule is strongly changed. (non-break -> break before or after).
> -- Makoto

More information about the Unicode mailing list