Moving The Hebrew Extended Block Into The SMP

Mark Shoulson mark at kli.org
Tue May 10 21:46:04 CDT 2016


On 05/10/2016 09:08 PM, Robert Wheelock wrote:
>
> ·U+30000—U+30014 (21 codepoints):  Additional characters for 
> typesetting Biblical/Classical Hebrew

Do you have this list available yet?  I'm curious about these points, 
and others.

> ·U+30015—U+3001F (11 codepoints):  Palestinian vowel and pronunciation 
> points for Hebrew and Galilean Aramaic
> ·U+30020—U+30021 (2 codepoints):  Small superscript top-left signs for 
> the letter /shin/—superscript śin and superscript shin

I thought SIN was indicated sometimes by a SAMEKH written above the 
letter.  How would putting a SIN (which is just a SHIN with a dot on the 
left instead of the right) on top of the letter be any improvement (or 
difference) over just putting the dot on the left of the base letter in 
the first place?

> ·U+30022—U+30041 (32 codepoints):  Palestinian cantillation signs for 
> Hebrew and Galilean Aramaic
> ·U+30042 is reserved
> ·U+30043—U+3005C (26 codepoints):  Babylonian vowel and pronunciation 
> points for Hebrew
> ·U+3005D—U+3005F are reserved
> ·U+30060—U+30071 (18 codepoints):  Babylonian cantillation signs for 
> Hebrew
> ·U+30072—U+3007D are reserved
> ·U+3007E—U+3008F (18 codepoints):  Samaritan vowel points, 
> pronunciation points, and cantillation signs for Hebrew (copies of 
> those also being used for Samaritan script in BMP)

OK, here I'm confused.  Why do we need copies?  Unicode doesn't like to 
encode redundant things, and it only makes for messes (when do you use 
which ZIQAA?)  If we have the characters in the BMP, we don't need them 
in the SMP.

> ·U+30090—U+3010F (128 codepoints):  Additional characters in Hebrew 
> script for other Jewish languages (these are pointed like the 
> corresponding Arabic characters in the BMP)

So additional Hebrew "letters" that take Arabic vowel-points?  Makes 
sense; I saw some of that with Samaritan (particularly with DAMMA). We 
should probably just use the Arabic vowel code-points though.

> ·U+30110—U+3012F (32 codepoints):  Basic Hebrew superscript characters 
> (regular letters+5 final forms+top-left pointed /śin/+top-right 
> pointed /shin/+/maqqef/)
> ·U+30130—U+3014F (32 codepoints):  Basic Hebrew subscript characters 
> (regular letters+5 final forms+top-left pointed /śin/+top-right 
> pointed /shin/+/maqqef/)

When you say "superscript" (or "subscript"), do you mean "spacing 
character that's written small and raised/lowered"?  Or do you mean 
"combining character that's written above/below another character"? cf. 
the difference between U+2071 SUPERSCRIPT LATIN SMALL LETTER I and 
U+0365 COMBINING LATIN SMALL LETTER I).  If the former, is there a 
reason this has to be done as plain-text and can't be handled by 
higher-level markup?  Probably every major script has been written small 
and high in some places, but we don't have superscript versions of every 
letter in Unicode.


~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160510/c8adc888/attachment.html>


More information about the Unicode mailing list