Hentaigana proposal

Wed Dec 16 12:17:06 CST 2015

On Wed, Dec 9, 2015 at 7:55 AM, Nicolas Tranter
<n.tranter at sheffield.ac.uk> wrote:
> I comment as a western Japanologist who teaches and researches using
> hentaigana. I have published with hentaigana using image files (resulting in
> two publisher errors) and will publish next year with hentaigana using the
> Koin Hentaigana font (Koin変体仮名外字明朝.tte), and anticipate typesetting
> problems. I refer to the 2015 proposal L2/15-239 to include hentaigana,
> including the appended paper by Takada Tomokazu, Yada Tsutomu and Saito
> Tatsuya ('The past, present and future of Hentaigana Standardization for
> Information Interchange'). I also refer to Yada Tsutomu's support of the
> proposal ('About the inclusion of standardized codepoints for Hentaigana',
> L2/15-318). As the names and numbering of proposed characters is an issue I
> deal with below, I also refer to individual hentaigana in the proposal by
> their MJ-codes as used in the proposers' own websites (e.g.
> http://mojikiban.ipa.go.jp/xb164/).
>
>
>
> SELECTION: The selection is good, consisting of 286 forms, although this
> would be realised as 299 characters. The earlier 2009 proposal referred to
> was based on the Mojikyo M113.ttf font, which has 213 hentaigana characters
> and includes a few major basic gaps. The Koin Hentaigana font has 549
> characters, which excluding separate forms with voicing and 'half-voicing'
> diacritics consists of 330 hentaigana, but includes some very rare forms,
> including ones that do not occur in late period texts.
>
>
>
> The selection of 'academic' hentaigana is appropriate and lacks major gaps.
> On the other hand, the Ministry of Justice hentaigana requirements are ones
> that have been decided by the Ministry of Justice in 2004 for name
> registration purposes, and so, although one could argue easily with their
> 2004 decision (and I would), the fact that they are already official means
> it is pointless to argue with their inclusion in Unicode.
>
>
>
> It's been noted that a few hentaigana are almost identical to normal
> hiragana, especially e HENTAIGANA LETTER E VARIANT 4 = MJ090017 (cf. え), shi
> HENTAIGANA LETTER SI VARIANT 2 = MJ090072 (cf. HIRAGANA LETTER SI し) and nu
> HENTAIGANA LETTER NU VARIANT 2 = MJ090149 (cf. HIRAGANA LETTER NU ぬ): their
> differences are solely that the 'brush' is removed from the paper on a
> downward rather than a rightward flourish, reflecting vertical handwriting.
> Ordinarily I would argue against including them, but since the MoJ has
> recognised them as official variants they need to be included.
>
>
>
> The decision to propose in most cases one codepoint for the hentaigana
> derived from a single Chinese character is sensible, as also is the decision
> to allow multiple codepoints in certain cases where manuscripts use
> side-by-side significantly distinct forms derived from the same Chinese
> character and with the same value. An example of the latter is HENTAIGANA
> LETTER KA VARIANT 3 = MJ090025and KA VARIANT 4 = MJ090026, both pronounced
> ka and both derived from the Chinese character 可, but which are routinely
> both found in the same manuscript by the same hand as if they were separate
> graphemes from the Heian to the Meiji periods.
>
>
>
> POLYPHONY. Several hentaigana are truly polyphonous (e.g. the 子-derived
> hentaigana = ne MJ090151 or MJ090059 ko, or the 馬-derived hentaigana = me
> MJ090222 or ma MJ090205). In particular, those hentaigana derived from 无 and
> associated with n (MJ090298, MJ090299) historically (also the source of
> HIRAGANA LETTER N ん)  are also used for mu (MJ090214, MJ090215) and mo
> (MJ090224, MJ090223). Diachronically, n in native Japanese words is usually
> derived from an earlier mu. Takada et al. includes a list of 10 kanji
> sources that this applies to in the proposed repertoire. (Strictly, this
> affects 11 hentaigana, because the proposal has two forms for 无-derived
> characters.) The proposal's solution is to assign different identifiers,
> e.g. 子 = HENTAIGANA LETTER NE VARIANT 1 and HENTAIGANA LETTER KO VARIANT 2,
> 馬 = HENTAIGANA LETTER ME VARIANT 3 and HENTAIGANA LETTER MA VARIANT 7, and
> the two derived from 无 = HENTAIGANA LETTER N VARIANT 1, N VARIANT 2, MU
> VARIANT 1, MU VARIANT 2, MO VARIANT 1 and MO VARIANT 2. This means that
> there would be characters that are given more than one codepoint and
> identifier but are formally and etymologically identical, adding 13
> unnecessary repetitions to the character set. I would favour Yada's naming
> system, where the polyphonous characters are given a single codepoint and
> identifier, e.g. 子 = HENTAIGANA LETTER NE-KO, 馬 = HENTAIGANA ME-MA, and two
> 无-derived forms = HENTAIGANA LETTER N-MU-MO 1 and N-MU-MO 2.

Is there a reason for sticking with the "VARIANT 1"/"VARIANT 2" naming
convention? The previous proposal was for standardized variation
sequences, so this opaque numbering made sense (since "VARIANT 1"
meant "using the first variation selector"), but the current one is to
encode them all as atomic characters. Wouldn't it be more helpful to
give them more descriptive names, possibly by identifying the
particular ideographs each is derived from? For example, instead of
HENTAIGANA LETTER E VARIANT 2, it could be HENTAIGANA LETTER E FROM
CJK-76C8. This doesn't help with same-source variants, but physical
features could work for that, e.g.

HENTAIGANA LETTER YO VARIANT4 -> HENTAIGANA LETTER YO FROM CJK-8207
WITH CROSSBAR
HENTAIGANA LETTER YO VARIANT5 -> HENTAIGANA LETTER YO FROM CJK-8207 WITH LOOP
HENTAIGANA LETTER YO VARIANT6 ->  HENTAIGANA LETTER YO FROM CJK-8207 WITH ZIGZAG

It's more verbose but it seems like it would be useful to be able to
identify which variant is which from the name instead of having to
consult the code charts (which IIRC aren't normative) or some
supplementary table.