Addition of Latin theta as a separate codepoint

Calvin Southwood calvin at wardbox.co.uk
Sun Apr 10 12:03:32 CDT 2022


I'm curious about the current state of adding a Latin theta to the Unicode
standard as a separate codepoint. I've seen various proposals that include
adding such a character as part of support for certain European languages,
but I'm not sure as to the progress of these.

My personal interest in a Latin theta stems from the fact that it's the
only Greek character used as part of the IPA that doesn't have a separate
codepoint for a Latin version of the character. From a
linguistic perspective this is a barrier to implementing a general abugida
that can be represented using the IPA, because many current operating
systems and browsers (including Windows and Chrome) will parse a Greek
theta next to adjacent Latin characters as separate text runs, and
therefore major font rendering engines such as HarfBuzz will never be able
to render a ligature between a Greek theta and a Latin vowel as a single
character.

After an exchange with the maintainers of Harfbuzz, the proposed solutions
I've seen are either adding a Latin theta as a separate codepoint, or
adding Script_Extensions "Latn Grek" to the Greek theta so that parsing
engines that use these extensions for text segmentation can parse a string
of Latin characters including Greek theta as a single string.

While I am somewhat aware of arguments against this involving
disunification, I feel there is a strong argument in favour of such a
character from a linguistic and accessibility perspective due to its nature
as an IPA character. Can anyone give me some insight as to the current
thinking on this issue?

Thanks very much,
Calvin Southwood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220410/37802a14/attachment.htm>


More information about the Unicode mailing list