Why incomplete subscript/superscript alphabet ?
Marcel Schneider
charupdate at orange.fr
Wed Oct 5 08:57:48 CDT 2016
On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. Dürst wrote:
> On 2016/10/04 19:35, Marcel Schneider wrote:
>> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
>>
>>> Later, the beta and gamma were encoded for phonetic notation, but not the
>>> alpha.
>>>
>>> As a result, you can write basic formulas for select compounds, but not all.
>>> Given that these basic formulae don't need full 2-D layout, this still seems
>>> like an arbitrary restriction.
>>
>> When itʼs about informatics, arbitrary restrictions are precisely what gets me
>> upset. Those limitations are—as I wrote the other day—a useless worsening
>> of the usability and usefulness of a product.
>
> This kind of "let's avoid arbitrary limitations" argument works very
> well for subjects that are theoretical, straightforward, and rigid in
> nature. Many (but not all) subjects in computer science (informatics)
> are indeed of such a nature.
>
> The Unicode Consortium (or more specifically, the UTC) does a lot of
> hard work to create theories where appropriate, and to explain them
> where possible. But they recognize (and we should do so, too) that in
> the end, writing is a *cultural* phenomenon, where straightforward,
> rigid theories have severe limitations.
>
> From a certain viewpoint (the chemist's in the example above), the
> result may look arbitrary, but from another viewpoint (the
> phoneticist's), it looks perfectly fine. At first, it looks like it
> would be easy to fix such problems, but each fix risks to introduce new
> arbitrariness when seen from somebody else's viewpoint. Getting upset
> won't help.
Iʼve got the point, thanks. Phonetics need to write running text that is
immediately legible, while a chemistry database may use particular notational
conventions that work with baseline letters to be parsed on semantics or light
markup for proper display in the UI. The UTC decision thus questioned the design
principle of using plain text for chemical formulae. No doubt it was understood
that validating this choice would have opened the door to encoding more special
characters for upgrading or similar purposes.
At this point Iʼd like to mention what I thought about since this thread
was launched. The French language makes extensive use of superscripts
to note abbreviations. This is not a mere styling issue, as it is in English.
E.g. without superscripts, the abbreviation ‘nos’ [numbers] is ambiguated with
the pronoun ‘nos’ [our]. The most that can be easily disambiguated is ‘n°’ [number]
with the degree sign available on the common French keyboard layout.
For the anecdote: When a technician led me to discover the field
‘no centre mess’ in the UI of my cellphone, it took me several seconds to understand
‘number of SMS center/centre’ which is the actual meaning; but here, some additional
confusion resulted from the interlanguage homograph ‘no’.
Written words being ambiguated with one another is a common phenomenon in
natural languages. Performing disambiguation is widely achieved by adding
vowel signs (Hebrew) or diacritics (Latin script using languages).
French was disfavored in computer practice (applied informatics) during a
certain time when diacritics were unavailable—on uppercase letters longer
than on lowercase.
AFAIK, Latin letters like ‘ij’ and ‘œ’ first gained binary existence thanks
to the ISO 6937 charset, while a Dutch standards author asked his compatriots
to always write ‘ij’ with two ASCII letters, and two Frenchmen prevented the ‘œ’
from being encoded in Latin-1 at the intended code points because of its
non-existence in computer printers.
But today, thanks to Unicode, thatʼs all over. Therefore I suggest to grant
the French language full support by enabling superscript lowercase letters
in order that the SUPERSCRIPT deadkey that the French Standards body recommends,
will work for all abreviations. There is no point about other letters than the basic
alphabet superscripted, as no French abbreviation exceeds this range (despite of
what I believed in 2014, like many other people).
Additionally Iʼm proposing a modifier key combination (using a new modifier key on
the 105th key on ISO keyboards) to access the lowercase superscripts on live keys:
Shift + Num + [letter key] ➔ [superscript lowercase].
I can easily type ‘on the 105ᵗʰ key’, and so will all users in France, at least
with the dead key.
The missing letter is superscript q == MODIFIER LETTER SMALL Q.
Actually, when Shift + Num + Q is pressed on the projects,
‘ ↑q_n’existe_pas’ [ superscript ‘q’ does not exist] is inserted.
Karl Pentzlin had the merit of proposing the missing letter superscript q
for use in French abbreviations, but the UTC must have refused by arguing
from English usage and from French recommendations. These are now changing.
More, as I tried to demonstrate above, one cannot always rely on such
low-profile recommendations, which express more the humility and undemandingness
of their author, than the real practical needs and linguistical requirements.
As of searchability, Google have even the mathematical alphabets in their
equivalence classes, so that any request written e.g. in doublestruck letters
is read as if it were entered in plain ASCII.
Best regards,
Marcel
More information about the Unicode
mailing list