Addition of Latin theta as a separate codepoint

Mon Apr 11 09:46:25 CDT 2022

ag disroot wrote:

> Speaking of which, katakana, hiragana, and the japanese kanji should
> all also get a script extension of japanese. because as it stands,
> harfbuzz seems to think that japanese is a mix of 3 languages. which
> means that I can't do any sort of ligature or contextual things on
> japanese bc of that

Hiragana and Katakana are both used to write only one language, Japanese. If HarfBuzz tries to correlate scripts (which Unicode encodes) to languages (which it doesn't) for rendering purposes, then out of all of the world's scripts, it ought to get these two right.

The ISO 15924 code element [Jpan] is defined as "Japanese (alias for Han + Hiragana + Katakana)". It is intended for Japanese texts written in a combination of these three scripts, as most Japanese texts are, not for characters in isolation.

It seems wrong for Unicode to identify kana as "kanji plus kana" and also identify kanji as "kanji plus kana" to try to fix one broken implementation. There are decades of precedent that Unicode does not do that.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org