Unicode Teaching in Universities

Tue Sep 7 05:04:29 CDT 2021

Doug Ewell wrote:

> (It had nothing to do with explicit selection of font styles or sizes 
> via "quasi-control characters," whatever those are.)

Actually, it was me who used the phrase "quasi-control character".

https://corp.unicode.org/pipermail/unicode/2021-September/009549.html

I know hardly anything about CJK encoding, I am trying to learn.

A quasi-control character would be a character that is encoded as an 
ordinary text character and could be displayed using a glyph. However, 
it could also (or instead) be used by a software system as a control 
character if that is what the end user prefers and he or she has such a 
software system available.

For example, there could be a quasi-control character which has a 
displayable glyph of a capital A and a capital G arranged in pale with 
the A above the G, all within a portrait-orientation rectangle, with a 
meaning of "Alphanumerics Green" which could be used in a Unicode plain 
text representation of a teletext page (that is, the teletext page being 
in English, French, German etc, I am not referring to a quasi-control 
character for CJK in this example). So in many uses the glyph would be 
displayed and would provide to the human reader an indication of the 
intended display. In a specialist software application the quasi-control 
character could be used such that the subsequent text is displayed in 
green and a space displayed for the quasi-control character rather than 
the glyph being displayed.

So I am simply wondering whether use of a quasi-control character for 
indicating the difference in the font style would solve the problem that 
is being discussed in the context of CJK if there is a need for a plain 
text solution.

> If you really need language tagging, to choose a font or render 
> punctuation or perform spell-checking or text-to-speech or some other 
> process, then use language tagging.

But alas U+E0001 has been deprecated.

> https://www.unicode.org/charts/PDF/UE0000.pdf

quote from that document

The use of tag characters to convey language tags is 
stronglydiscouraged.

Tag identifiersE0001  LANGUAGE TAG

  • This character is deprecated, and its use isstrongly discouraged.

end quote

Should U+E0001 LANGUAGE TAG become undeprecated?

>> In analog era anyone can just write a new characters in ways they
desire and spread it around, and if the usage picked up then it would
become part of the language, but it's impossible to do the same
through Unicode.
> Nor through any of the Chinese or Japanese national standards. This is 
> a fact of life with standardized character sets in general, and has 
> nothing to do with Han unification.

Well, there could in theory be introduced a system that could solve that 
problem, using a technique similar to that which has been proposed for 
QID emoji, yet a separate system managed directly by Unicode Inc.. 
Indeed there could be more than one such system, one (or maybe more than 
one?) for CJK glyphs and another for Latin-style characters and another 
for other systems. Basically more or less automatic, fairly prompt, 
registration with only mild moderation by Unicode Inc.. So systems 
having both the freedoms of the Private Use Areas yet also some of the 
precision of regular Unicode encoding as regards interoperability. That 
could be a major step forward in the development and application of 
Unicode.

William Overington

Tuesday 7 September 2021

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210907/daae89c9/attachment.htm>