Devanagari and Subscript and Superscript

Plug Gulp plug.gulp at gmail.com
Tue Dec 15 05:55:02 CST 2015


On Wed, Dec 9, 2015 at 5:18 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>
> I suggest using HTML:
>
> ब<sup>क ्ष</sup>
>

This will work only if the end-users are always going to use a web
browser to view the text content.

It will help if Unicode standard itself intrinsically supports
generalised subscript/superscript text. I think the meaning of the
text should be contained within the text itself rather than relying on
external text markers and viewers. That way the text-content creator
does not have to rely on what type of unicode compliant text viewer or
editor the end user is using. The text should retain it's meaning
irrespective of the type of unicode compliant text viewer or editor
used. Similarly, if the text has to be saved in a database without
losing it's meaning, then either it has to be saved with all the known
markers of all the available editors, or some special processing needs
to be incorporated to convert some saved marker to markers of various
available text viewers and editors. Having generalised Unicode support
for superscript and subscript will solve all these problems.

Following is one of the use-cases where general Unicode support for
superscript/subscript will help tremendously:

A math teacher(गणिताचे शिक्षक) in a Marathi(मराठी) language school is
writing notes, in her Unicode compliant plain text editor, to explain
mathematical terms to her students. Following is an excerpt from the
notes that explains terms such as exponents(घातांक) and base(पाया).
(English translation is given below):

"जेव्हा एखाद्या संखेचा स्वतःशीच अनेक वेळा गुणाकार होतो तेव्हा त्या
गुणाकाराला थोडक्यात लिहिण्याच्या पद्धतीला घातांक असे म्हणतात.
उदाहरणार्थ, ५ ही संख्या जर स्वतःशी ३ वेळा गुणली जात असेल, म्हणजे ५ x ५
x ५, तर त्याला घातांक पद्धतीत ५^३ असे लिहितात. ह्या घातांकीय रचनेला "५
चा ३ रा घात" असे म्हणतात. आपण अजून एक उदाहरण घेऊया, "२ ना चा १० वा
घात", म्हणजे २ ही संख्या स्वतःशी १० वेळा गुणली गेली आहे. ह्याला आपण
२^१० असे लिहितो. तर साधारणपणे, कूठलीही संख्या ब जेव्हा स्वतःशी क्ष
वेळा गुणलीजाते तेव्हा त्याला घातांक पद्धतीत ब^क्ष असे लिहितात, आणि
त्या रचनेला "ब चा क्ष वा घात" असे म्हणतात. इथे ब ह्या संखेला पाया
म्हणतात आणि क्ष ह्या संखेला घात असे म्हणतात. तर थोडक्यात, घातांकीय
रचनेला पाया^घात असे लिहितात."

English translation:
"Exponent is a shorthand notation that denotes a multiplication of a
number by itself a number of times. For example, if a number 5 is
multiplied by itself 3 times i.e. 5 x 5 x 5, then it is represented in
an exponential form as 5^3. This exponential term is referred to as "5
raise to the power of 3". Let us consider another example, "2 raise to
the power of 10", i.e. 2 is multiplied by itself 10 times. This is
written in exponential form as 2^10. So, in general any number b that
is multiplied by itself k number of times is written as b^k and the
term is referred to as "b raise to the power of k". The number b is
called the base, and the number k is called the exponent. In short,
exponential term is written as base^exponent."

Please note that the teacher had to use a Circumflex Accent (Caret) to
indicate superscript, which is an unwritten convention, in the absence
of proper superscript support within Unicode. To make the text
available to wider audience and still retain it's meaning, the teacher
will have to partly rely on Unicode support, partly on the markers
available in the various text viewers of her students, partly on the
markers available in the text editors of the peer-reviewers of her
text and partly on the unwritten convention(such as the caret). This
conundrum can be resolved only if there is a generalised support for
superscript and subscript within Unicode standard.

The standard already has a section for superscript and subscript.
Generalising and extending this support will help other languages and
scripts. General support for all characters, words and sentences could
be achieved by just three new formatting characters, e.g. SCR, SUP and
SUB, similar to the way other formatting characters such as ZWS, ZWJ,
ZWNJ etc are defined. The new formatting characters could be defined
as:

SCR: In a character stream, all the characters following this
formatting character shall be treated as normal text until either the
end of the character stream or the next SUP or SUB character is
reached. This shall be the default marker i.e. if no marker is
specified then the text shall be treated as normal text until either
the end of the character stream or the next SUP or SUB character is
reached.

SUP: In a character stream, all the characters following this
formatting character shall be treated as superscript text until either
the end of the character stream or the next SCR or SUB character is
reached.

SUB: In a character stream, all the characters following this
formatting character shall be treated as subscript text until either
the end of the character stream or the next SCR or SUP character is
reached.

A general support within Unicode for subscripting and superscripting
text(characters and words) will tremendously help languages and
scripts that are not English/Latin.

Thanks and kind regards,

~Plug





>>
>> Hi,
>>
>> I am trying to understand if there is a way to use Devanagari
>> characters (and grapheme clusters) as subscript and/or superscript in
>> unicode text. It will help if someone could please direct me to any
>> document that explains how to achieve that. Is there a unicode marker
>> that will treat the next grapheme cluster in the unicode text as
>> super/subscript? For e.g. if one wants to represent "ब raise to क्ष"
>> how does one achieve that; is there a marker to represent it as
>> follows: ब + SUP + क + ् + ष
>> where SUP acts as a marker for superscripting the next grapheme
>> cluster. Similar for subscripting.
>>
>> Sorry if this is not the right place to ask this question; in that
>> case please could you direct me to the right forum?
>>
>> Thanks and kind regards
>>
>> ~Plug
>>
>> .
>>
>



More information about the Unicode mailing list