[Unicode] Re: HENTAIGANA LETTER E-1

Wed Jan 13 15:39:26 CST 2016

On Fri, Jan 8, 2016 at 6:55 AM, suzuki toshiya
<mpsuzuki at hiroshima-u.ac.jp> wrote:
> Garth Wallace wrote:
>> On Thu, Jan 7, 2016 at 7:56 AM, suzuki toshiya
>> <mpsuzuki at hiroshima-u.ac.jp> wrote:
>>> Hi,
>>>
>>> I'm not a representative of the experts working for the
>>> proposal from Japan NB, but I could explain something.
>>>
>>> 1) "They never took that out?" I'm not sure who you mean
>>> "they" (UTC? JNB?), but it seems that no official document
>>> asking for the response from JNB is submitted in WG2.
>>> If UTC sends something officially, JNB would response
>>> something, I believe.
>>
>> I meant the JNB. I thought they had removed that character from the
>> later revised proposals that were posted on the UTC document register,
>> but I checked and I had apparently been mistaken.
>>
>> The issue is only raised in passing in a footnote in Mr. Lunde's feedback.
>
> I think HENTAIGANA LETTER E-1 is intentionally proposed
> to be coded separately, and no official document is
> sent to JNB, so it is still kept as it was before.
>
>>> 2) Difference in HENTAIGANA LETTER E-1 and U+1B001.
>>>
>>> U+1B001 is a character designed to note an ancient (and
>>> extinct in modern Japanese language) pronunciation YE.
>>>
>>> When standard kana was defined about 100 years ago,
>>> the pronunciation YE was already merged to E.
>>> Some scholars planned to use a few kana-like characters
>>> to note such pronunciation (to discuss about the ancient
>>> Japanese language pronunciation), and used some hentaigana-
>>> like glyphs for such purpose. As far as I know, there is
>>> no wide consensus that the glyph looking like U+1B001 was
>>> historically used to note YE mainly, when YE and E were
>>> distinctively used in Japanese language.
>>
>> AIUI they simply reused an existing hentaigana to make the
>> distinction, rather than making a new kana that just happened to look
>> exactly like it.
>
> It is difficult (for me) to judge U+1B001 has same identity
> with the hentaigana before kana standardization with similar
> appearance. The rationale to encode U+1B001 was justified by
> its unique phonetic value, so its character name is YE. It
> is normative. Some people may think they can identify the
> hentaigana by their glyph shapes only, but others may have
> different view. As the first proposal (L2/15-193) prioritized
> the (modern) phonetic value as the first key to identify the
> glyph, I think some user community would want to identify the
> glyph by the phonetic value. I don't say it is the best
> solution, but I say they have their own rationale.

The rationale for U+1B001, AIUI, was that it was used in some modern
scholarly works about the history of the Japanese language to
distinguish between /e/ and /je/ before they merged in the modern
language. I don't know if historically that distinction existed in
writing.

The character name is normative. But the pronunciation is not, and I
don't think the Unicode name should be taken to mean that it can only
be used when a particular pronunciation is intended. Spelling and
pronunciation are outside of Unicode's scope.

>>> On the other hand, JNB's proposal does not include any
>>> ancient/extinct pronunciation, Their phonetic coverage
>>> is exactly same with modern Japanese language. So,
>>> the glyph looking like U+1B001 is not designed to note
>>> the pronunciation YE. The motivation why JNB proposed
>>> hentaigana would be just because of their shape differences.
>>>
>>> Therefore, U+1B001 and HENTAIGANA E-1 could be said as
>>> differently designed, their designed usages are different.
>>> Please do not think JNB hentaigana experts overlooked
>>> U+1B001 and proposed a duplicated encoding. They ought to
>>> have known it but proposed.
>>
>> It's not unknown for a single character to have more than one
>> pronunciation in different contexts.
>
> Is it easy to distinguish the contexts how the "unified U+1B001"
> should be pronounced (some case, it must be YE, some case, it
> must be E, some case, both of YE/E are acceptable)? I don't have
> good connection with the users community of U+1B001, so I cannot
> estimate which is easier (less troublesome for existing user
> communities) in separation or unification. Do you have any
> connection with the user community of U+1B001?

I do not. For that matter, I'm not a member of the UTC. I've only read
Nozomu Katō's original proposal
<http://www.unicode.org/L2/L2007/07421-e-ye.pdf> and some of the
documents that followed.

>>> However, some WG2 experts suggested to unify them because
>>> of the shape similarity. I'm not sure whether 2 glyphs are
>>> indistinctively similar for hentaigana scholars, but I
>>> accept with that some people are hard to distinguish.
>>> I cannot distinguish some Latin and Greek alphabets when
>>> they are displayed as single isolated character.
>>
>> We're not talking about about different scripts, though. Hentaigana
>> are obsolete hiragana (eliminated from modern written Japanese by a
>> spelling reform) but they are still hiragana. Latin and Greek, on the
>> other hand, are clearly separate but related scripts.
>
> I'm afraid that the counting how many scripts in the set
> of modern hiragana, U+1B001 and JNB proposal could depend
> on the people. Some people may count only 1, some people
> may count 2, some people may count 3. If there is stable
> consensus already, it could be used as the rational to unify,
> but, I don't think so. Anyway, Latin and Greek were not
> good example, I'm sorry.

You're right, it's unclear, though at least in Unicode terms I don't
think you can really count 3. U+1B001 has the script property
"hiragana", but that still leaves the question of whether hentaigana
should be considered a separate script from hiragana. The proposal
summary for L2/15-239 does say it's for a new script, named
"hentaigana". However, elsewhere in that document it says "In year
1900, Japanese government selected one phonogram for each phonetic
value and announced not to use other phonograms in elementary
education. Afterward, the selected phonograms are called “HIRAGANA”
and others are called “HENTAIGANA”, the meaning is variants of a
HIRAGANA." Also, the original proposal was to encode them as Standard
Variation Sequences of hiragana, which I think implies that the JNB,
at least at that time, considered them to be variants of hiragana and
not something other than hiragana. AIUI, and correct me if I'm wrong,
hentaigana is a retronym; at the time they were in regular use they
were used in combination with and interchangeably with the modern set
of hiragana, and did not have an identity as a distinct set until the
spelling reform of 1900.

I believe that in Unicode, characters that were once used in a script
but were later made obsolete are usually still considered part of the
same script as the surviving set. That has been the case for Latin, at
least.