Hentaigana proposal

Wed Dec 16 12:31:46 CST 2015

I like the more descriptive names, but I'd like to have this data available
in some supplementary table available anyway, regardless of the naming
scheme.

2015-12-16 16:17 GMT-02:00 Garth Wallace <gwalla at gmail.com>:

> On Wed, Dec 9, 2015 at 7:55 AM, Nicolas Tranter
> <n.tranter at sheffield.ac.uk> wrote:
> > I comment as a western Japanologist who teaches and researches using
> > hentaigana. I have published with hentaigana using image files
> (resulting in
> > two publisher errors) and will publish next year with hentaigana using
> the
> > Koin Hentaigana font (Koin変体仮名外字明朝.tte), and anticipate typesetting
> > problems. I refer to the 2015 proposal L2/15-239 to include hentaigana,
> > including the appended paper by Takada Tomokazu, Yada Tsutomu and Saito
> > Tatsuya ('The past, present and future of Hentaigana Standardization for
> > Information Interchange'). I also refer to Yada Tsutomu's support of the
> > proposal ('About the inclusion of standardized codepoints for
> Hentaigana',
> > L2/15-318). As the names and numbering of proposed characters is an
> issue I
> > deal with below, I also refer to individual hentaigana in the proposal by
> > their MJ-codes as used in the proposers' own websites (e.g.
> > http://mojikiban.ipa.go.jp/xb164/).
> >
> >
> >
> > SELECTION: The selection is good, consisting of 286 forms, although this
> > would be realised as 299 characters. The earlier 2009 proposal referred
> to
> > was based on the Mojikyo M113.ttf font, which has 213 hentaigana
> characters
> > and includes a few major basic gaps. The Koin Hentaigana font has 549
> > characters, which excluding separate forms with voicing and
> 'half-voicing'
> > diacritics consists of 330 hentaigana, but includes some very rare forms,
> > including ones that do not occur in late period texts.
> >
> >
> >
> > The selection of 'academic' hentaigana is appropriate and lacks major
> gaps.
> > On the other hand, the Ministry of Justice hentaigana requirements are
> ones
> > that have been decided by the Ministry of Justice in 2004 for name
> > registration purposes, and so, although one could argue easily with their
> > 2004 decision (and I would), the fact that they are already official
> means
> > it is pointless to argue with their inclusion in Unicode.
> >
> >
> >
> > It's been noted that a few hentaigana are almost identical to normal
> > hiragana, especially e HENTAIGANA LETTER E VARIANT 4 = MJ090017 (cf. え),
> shi
> > HENTAIGANA LETTER SI VARIANT 2 = MJ090072 (cf. HIRAGANA LETTER SI し) and
> nu
> > HENTAIGANA LETTER NU VARIANT 2 = MJ090149 (cf. HIRAGANA LETTER NU ぬ):
> their
> > differences are solely that the 'brush' is removed from the paper on a
> > downward rather than a rightward flourish, reflecting vertical
> handwriting.
> > Ordinarily I would argue against including them, but since the MoJ has
> > recognised them as official variants they need to be included.
> >
> >
> >
> > The decision to propose in most cases one codepoint for the hentaigana
> > derived from a single Chinese character is sensible, as also is the
> decision
> > to allow multiple codepoints in certain cases where manuscripts use
> > side-by-side significantly distinct forms derived from the same Chinese
> > character and with the same value. An example of the latter is HENTAIGANA
> > LETTER KA VARIANT 3 = MJ090025and KA VARIANT 4 = MJ090026, both
> pronounced
> > ka and both derived from the Chinese character 可, but which are routinely
> > both found in the same manuscript by the same hand as if they were
> separate
> > graphemes from the Heian to the Meiji periods.
> >
> >
> >
> > POLYPHONY. Several hentaigana are truly polyphonous (e.g. the 子-derived
> > hentaigana = ne MJ090151 or MJ090059 ko, or the 馬-derived hentaigana = me
> > MJ090222 or ma MJ090205). In particular, those hentaigana derived from 无
> and
> > associated with n (MJ090298, MJ090299) historically (also the source of
> > HIRAGANA LETTER N ん)  are also used for mu (MJ090214, MJ090215) and mo
> > (MJ090224, MJ090223). Diachronically, n in native Japanese words is
> usually
> > derived from an earlier mu. Takada et al. includes a list of 10 kanji
> > sources that this applies to in the proposed repertoire. (Strictly, this
> > affects 11 hentaigana, because the proposal has two forms for 无-derived
> > characters.) The proposal's solution is to assign different identifiers,
> > e.g. 子 = HENTAIGANA LETTER NE VARIANT 1 and HENTAIGANA LETTER KO VARIANT
> 2,
> > 馬 = HENTAIGANA LETTER ME VARIANT 3 and HENTAIGANA LETTER MA VARIANT 7,
> and
> > the two derived from 无 = HENTAIGANA LETTER N VARIANT 1, N VARIANT 2, MU
> > VARIANT 1, MU VARIANT 2, MO VARIANT 1 and MO VARIANT 2. This means that
> > there would be characters that are given more than one codepoint and
> > identifier but are formally and etymologically identical, adding 13
> > unnecessary repetitions to the character set. I would favour Yada's
> naming
> > system, where the polyphonous characters are given a single codepoint and
> > identifier, e.g. 子 = HENTAIGANA LETTER NE-KO, 馬 = HENTAIGANA ME-MA, and
> two
> > 无-derived forms = HENTAIGANA LETTER N-MU-MO 1 and N-MU-MO 2.
>
> Is there a reason for sticking with the "VARIANT 1"/"VARIANT 2" naming
> convention? The previous proposal was for standardized variation
> sequences, so this opaque numbering made sense (since "VARIANT 1"
> meant "using the first variation selector"), but the current one is to
> encode them all as atomic characters. Wouldn't it be more helpful to
> give them more descriptive names, possibly by identifying the
> particular ideographs each is derived from? For example, instead of
> HENTAIGANA LETTER E VARIANT 2, it could be HENTAIGANA LETTER E FROM
> CJK-76C8. This doesn't help with same-source variants, but physical
> features could work for that, e.g.
>
> HENTAIGANA LETTER YO VARIANT4 -> HENTAIGANA LETTER YO FROM CJK-8207
> WITH CROSSBAR
> HENTAIGANA LETTER YO VARIANT5 -> HENTAIGANA LETTER YO FROM CJK-8207 WITH
> LOOP
> HENTAIGANA LETTER YO VARIANT6 ->  HENTAIGANA LETTER YO FROM CJK-8207 WITH
> ZIGZAG
>
> It's more verbose but it seems like it would be useful to be able to
> identify which variant is which from the name instead of having to
> consult the code charts (which IIRC aren't normative) or some
> supplementary table.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20151216/24395198/attachment.html>