Unicode Teaching in Universities

Phake Nick c933103 at gmail.com
Fri Sep 3 12:58:13 CDT 2021


I would just say I still face problems related to CJK unification day
to day. Nowadays to ensure the display of correct glyph, it is
necessary to either use IVS or specify the language/font being used to
display each characters, but it is not quite possible to do so on many
platforms. Like I don't think it is fair to expect an average user to
pick the proper IVS or key in the correct language tag while sending a
tweet on Twitter or writing their primary school assignment in Notepad
or Google Docs or Evernote or Microsoft Words. There won't be enough
character count to do so in Twitter anyway.

Last time I saw an incorrect CJK glyph due to Han Unification was
yesterday, when I search for information about the Chinese city of
Xiamen, in Google Japan. Google Japan display result from multiple
languages, including results that are written in Simplified Chinese,
and since it was Google Japan in Japanese interface, they rendered the
Simplified Chinese character for the Chinese city name of Xiamen in
the result using Japanese character, which is an incorrect glyph. It
doesn't seems realistic to expect anyone in China when they type the
name of their city use IVS to prevent the wrong Japanese glyph from
appearing.
Last time I interacted with a company due to CJK compatibility problem
was a few days ago when I ask Freewrite, a typewriter maker, on their
products' multilingual support, for example the ability to display the
correct Chinese/Japanese glyph in different Chinese or Japanese
documents. They provided products which are intented for writers to
type wherever they like with minimal distraction, with even the arrow
keys being removed as they believe editing can be done sometimes later
to increase the efficiency of writing. Their answer to my question was
simply that, if you're looking for so many features then our product
wouldn't be suitable to you.
Last time I encountered a problem with Han unification in input
process was last hour on Duolingo. Duolingo now offer language courses
in Chinese in both Simplified and Traditional Chinese characters, but
due to the nature of Han Unification, they only transliterated the UI
between Simplified and Traditional characters, but Han Unification
mean they are not discerned during the input, and my only way to
complete those Duolingo language course requiring Chinese typing is by
using Google's virtual keyboard's Google Translate feature to
transliterate my input into the form that Duolingo would accept.
Last time I tried and failed in solving Han Unification problem was on
OpenStreetMaps. The OpenStreetMaps system record all objects in the
world, from oceans and continents, to individual streets and shops and
buildings and even individual lamppost in a park, by letting users
type their name in " name=* ", and the name is supposed to be the
locally used name in local language. By using this information,
OpenStreetMap project is able to create a map for the world using only
knowledge from its contributor, but nowhere in the process it specify
any language tag and nowhere in the process it help indicate which
language the name is. There are also " name:langcode=* ", but that's
considered a way to input foreign translation to the local name of a
local object, and there are also proposal to solve the problem by
specifying language used by name of each object, but of course no one
is going to do that for all the lamp post around the world. There are
also proposal to assume different languages being used in the name
according to national boundary, but that doesn't solve cases like
intentional use of foreign language when naming a place.
Last time I was saved by Han Unification was when some Chinese users
tried to create some Japanese posters, faking their products as from
Japan, but due to their system being not Japanese and thus the font
being used isn't Japanese, I was able to discern those are not
authentic Japanese products.

I have also remember, there are some professional text reading/editing
tool, in an attempt to decide how punctuation should be positioned, by
looking at whether the last character in the line is kana or kanji,
and apply the Japanese or Chinese rule of punctuation positioning
accordingly, despite Japanese sentences could also end with kanji, and
that result in punctuation being floated around different corners in
line throughout the entire article.

And there are also quite a number of tools, or even fonts, trying to
do simple Simplified and Traditional Chinese conversion by mapping
characters to each others. As the Unicode system cannot inherently
tell apart whether a character is a Chinese character or a Japanese
Kanji, those system resulted in a large amount of mistransliterated
Japanese content floating around the internet.

An alternative to Han Unification could be using a multilingual panel
for each East Asian language. It could also help solve the problem
related to the disappearance of old glyphs as a result of adaption of
national standard font as with IVS being hard to use for most regular
user, most people opt to use language tagging to make sure their
content are displayed properly, and that also push fontmakers into
making fonts with glyphs that are in comfort with national standard
because part of the language tagging system involve code for nations,
and sometimes that might not actually match what glyph people want.

And then a deeper problem with Unicode Han Unification is, due to how
convenient and how global it is for simple day to day use when the
important part is the content and even with sometimes there are
mistakes and errors they mostly function somewhat appropriately and
wouldn't give you something completely broken, there are very low need
for a new encoding system, but by the time Unicode emerge it was
already being seen that character-based encoding system isn't really a
proper way to encode Chinese characters, effectively terminating any
creation of new characters that would occur naturally in the process
of language usage. In analog era anyone can just write a new
characters in ways they desire and spread it around, and if the usage
picked up then it would become part of the language, but it's
impossible to do the same through Unicode. To get a new Chinese
character into Unicode, one would first need to submit application to
their national government then to the council for Unihan, and wait for
it to be included in Unicode CJK Extension, wait for font adding
support for such character in new code point, and then wait for new
font to be distributed through operation system update or be ship
together with new device, a process which would take a decade in
fortunate case if OS vendors are willing to add such glyph into their
system default font. It also create an infinitely growing list of
characters. A component-based system, like Korean Hangul being encoded
in decomposed form, could have mitigated such problem, but the
ideographic description characters aren't really up to the task and no
system actually treat those description characters as a way for user
to actually combine and form new characters.

Doug Ewell <doug at ewellic.org> 於 2021年9月3日週五 上午1:30寫道:
>
> Phake Nick wrote:
>
> > but in recent years I feel like I have hear more about the downside
> > of using the Unicode system as a tool developed from early era of
> > computing before internet became popular and the use of such system
> > to digitalize the entire world's text,
>
> It would be interesting to hear specifically what the "downside" is. Maybe Phake Nick can elaborate, or ask those who are unhappy with Unicode to elaborate.
>
> Does the fact that Unicode was originally developed more than 30 years ago (I guess that's the "early era") bother people? How does "before internet became popular" play into this? A universal character set, free from the context-sensitive character set switching used in the JIS X standards, should be an ideal solution for the Internet.
>
> Are users in Japan still concerned about Japanese characters requiring 3 bytes in UTF-8 as opposed to 2 bytes in the JIS X standards? Does UTF-8's immunity from cross-site scripting attacks not outweigh this for Web purposes?
>
> Do they still want to use out-of-band character-set designators as font selection hints? Are there still objections to CJK unification? And so on.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
>



More information about the Unicode mailing list