From plug.gulp at gmail.com Tue Dec 8 21:24:39 2015 From: plug.gulp at gmail.com (Plug Gulp) Date: Wed, 9 Dec 2015 03:24:39 +0000 Subject: Devanagari and Subscript and Superscript Message-ID: Hi, I am trying to understand if there is a way to use Devanagari characters (and grapheme clusters) as subscript and/or superscript in unicode text. It will help if someone could please direct me to any document that explains how to achieve that. Is there a unicode marker that will treat the next grapheme cluster in the unicode text as super/subscript? For e.g. if one wants to represent "? raise to ???" how does one achieve that; is there a marker to represent it as follows: ? + SUP + ? + ? + ? where SUP acts as a marker for superscripting the next grapheme cluster. Similar for subscripting. Sorry if this is not the right place to ask this question; in that case please could you direct me to the right forum? Thanks and kind regards ~Plug From maxwell at umiacs.umd.edu Wed Dec 9 09:42:13 2015 From: maxwell at umiacs.umd.edu (maxwell) Date: Wed, 09 Dec 2015 10:42:13 -0500 Subject: Devanagari and Subscript and Superscript In-Reply-To: References: Message-ID: On 2015-12-08 22:24, Plug Gulp wrote: > I am trying to understand if there is a way to use Devanagari > characters (and grapheme clusters) as subscript and/or superscript in > unicode text. It will help if someone could please direct me to any > document that explains how to achieve that. Is there a unicode marker > that will treat the next grapheme cluster in the unicode text as > super/subscript? For e.g. if one wants to represent "? raise to > ???" > how does one achieve that; is there a marker to represent it as > follows: ? + SUP + ? + ? + ? > where SUP acts as a marker for superscripting the next grapheme > cluster. Similar for subscripting. I may be wrong (it's been known to happen), but I don't think there's anything in Unicode that will sub-/super-script an arbitrary character. There are some pre-sub-/super-scripted Latin characters (see https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts), but that won't help you. So the next thing is, what are you using for displaying text? HTML, Word, LibreOffice, (Xe)LaTeX,...? Because it will probably have to be done in that tool. Mike Maxwell From richard.wordingham at ntlworld.com Fri Dec 11 06:28:48 2015 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 11 Dec 2015 12:28:48 +0000 Subject: Devanagari and Subscript and Superscript In-Reply-To: References: Message-ID: <20151211122848.03ad0d7b@JRWUBU2> On Wed, 9 Dec 2015 03:24:39 +0000 Plug Gulp wrote: > I am trying to understand if there is a way to use Devanagari > characters (and grapheme clusters) as subscript and/or superscript in > unicode text. Why do you want to do this? Are you asking about writing Devanagari vertically rather than horizontally? If that is what you want, you should be looking at mark-up such as is found in cascading style sheets (CSS). It is an important issue for CJK and Mongolian, and there have been questions as to what is needed for Indian scripts. (There's also an antiquarian interest for historical scripts, such as Phags-pa and even Egyptian - moves are afoot to support the hieroglyphic script as plain text.) Richard. From plug.gulp at gmail.com Tue Dec 15 05:55:02 2015 From: plug.gulp at gmail.com (Plug Gulp) Date: Tue, 15 Dec 2015 11:55:02 +0000 Subject: Devanagari and Subscript and Superscript In-Reply-To: <5667B9B5.3010208@it.aoyama.ac.jp> References: <5667B9B5.3010208@it.aoyama.ac.jp> Message-ID: On Wed, Dec 9, 2015 at 5:18 AM, Martin J. D?rst wrote: > > I suggest using HTML: > > ?? ?? > This will work only if the end-users are always going to use a web browser to view the text content. It will help if Unicode standard itself intrinsically supports generalised subscript/superscript text. I think the meaning of the text should be contained within the text itself rather than relying on external text markers and viewers. That way the text-content creator does not have to rely on what type of unicode compliant text viewer or editor the end user is using. The text should retain it's meaning irrespective of the type of unicode compliant text viewer or editor used. Similarly, if the text has to be saved in a database without losing it's meaning, then either it has to be saved with all the known markers of all the available editors, or some special processing needs to be incorporated to convert some saved marker to markers of various available text viewers and editors. Having generalised Unicode support for superscript and subscript will solve all these problems. Following is one of the use-cases where general Unicode support for superscript/subscript will help tremendously: A math teacher(??????? ??????) in a Marathi(?????) language school is writing notes, in her Unicode compliant plain text editor, to explain mathematical terms to her students. Following is an excerpt from the notes that explains terms such as exponents(??????) and base(????). (English translation is given below): "?????? ??????? ?????? ???????? ???? ???? ??????? ???? ?????? ???? ?????????? ???????? ???????????? ???????? ?????? ??? ???????. ??????????, ? ?? ?????? ?? ??????? ? ???? ????? ??? ????, ?????? ? x ? x ?, ?? ?????? ?????? ??????? ?^? ??? ???????. ???? ???????? ?????? "? ?? ? ?? ???" ??? ???????. ??? ???? ?? ?????? ?????, "? ?? ?? ?? ?? ???", ?????? ? ?? ?????? ??????? ?? ???? ????? ???? ???. ?????? ??? ?^?? ??? ??????. ?? ?????????, ??????? ?????? ? ?????? ??????? ??? ???? ????????? ?????? ?????? ?????? ??????? ?^??? ??? ???????, ??? ???? ?????? "? ?? ??? ?? ???" ??? ???????. ??? ? ???? ?????? ???? ??????? ??? ??? ???? ?????? ??? ??? ???????. ?? ????????, ???????? ?????? ????^??? ??? ???????." English translation: "Exponent is a shorthand notation that denotes a multiplication of a number by itself a number of times. For example, if a number 5 is multiplied by itself 3 times i.e. 5 x 5 x 5, then it is represented in an exponential form as 5^3. This exponential term is referred to as "5 raise to the power of 3". Let us consider another example, "2 raise to the power of 10", i.e. 2 is multiplied by itself 10 times. This is written in exponential form as 2^10. So, in general any number b that is multiplied by itself k number of times is written as b^k and the term is referred to as "b raise to the power of k". The number b is called the base, and the number k is called the exponent. In short, exponential term is written as base^exponent." Please note that the teacher had to use a Circumflex Accent (Caret) to indicate superscript, which is an unwritten convention, in the absence of proper superscript support within Unicode. To make the text available to wider audience and still retain it's meaning, the teacher will have to partly rely on Unicode support, partly on the markers available in the various text viewers of her students, partly on the markers available in the text editors of the peer-reviewers of her text and partly on the unwritten convention(such as the caret). This conundrum can be resolved only if there is a generalised support for superscript and subscript within Unicode standard. The standard already has a section for superscript and subscript. Generalising and extending this support will help other languages and scripts. General support for all characters, words and sentences could be achieved by just three new formatting characters, e.g. SCR, SUP and SUB, similar to the way other formatting characters such as ZWS, ZWJ, ZWNJ etc are defined. The new formatting characters could be defined as: SCR: In a character stream, all the characters following this formatting character shall be treated as normal text until either the end of the character stream or the next SUP or SUB character is reached. This shall be the default marker i.e. if no marker is specified then the text shall be treated as normal text until either the end of the character stream or the next SUP or SUB character is reached. SUP: In a character stream, all the characters following this formatting character shall be treated as superscript text until either the end of the character stream or the next SCR or SUB character is reached. SUB: In a character stream, all the characters following this formatting character shall be treated as subscript text until either the end of the character stream or the next SCR or SUP character is reached. A general support within Unicode for subscripting and superscripting text(characters and words) will tremendously help languages and scripts that are not English/Latin. Thanks and kind regards, ~Plug >> >> Hi, >> >> I am trying to understand if there is a way to use Devanagari >> characters (and grapheme clusters) as subscript and/or superscript in >> unicode text. It will help if someone could please direct me to any >> document that explains how to achieve that. Is there a unicode marker >> that will treat the next grapheme cluster in the unicode text as >> super/subscript? For e.g. if one wants to represent "? raise to ???" >> how does one achieve that; is there a marker to represent it as >> follows: ? + SUP + ? + ? + ? >> where SUP acts as a marker for superscripting the next grapheme >> cluster. Similar for subscripting. >> >> Sorry if this is not the right place to ask this question; in that >> case please could you direct me to the right forum? >> >> Thanks and kind regards >> >> ~Plug >> >> . >> >