Why do the Hebrew Alphabetic Presentation Forms Exist

jenkins john_h_jenkins at apple.com
Mon Jun 8 13:09:38 CDT 2020


Unicode *encoded* characters that other character sets have, even though it normally wouldn’t. That’s really not done anymore. It’s also a matter of what the character set in question is. The two mentioned here are too obscure IMHO to have ever been covered by round-trip compatibility.

> On Jun 8, 2020, at 11:45 AM, Abraham Gross via Unicode <unicode at unicode.org> wrote:
> 
> Unicode encodes characters that other character sets have even though it normally wouldn't. So if I find a character set with a folded lamed they'd add it?
> 
> Here are 2 character sets with a folded lamed:
> https://i.imgur.com/iq8awBe.jpg – an אלף בינה with the standing and folded lameds as separate letters.
> https://www.tug.org/TUGboat/tb15-3/tb44haralambous-hebrew.pdf#page=12 – A TeX typesetting module with the standing and folded lameds as separate characters for fine-grain control when the automatic system doesn't work.
> 
> 2020年6月7日 10:27, "Mark E. Shoulson via Unicode" <unicode at unicode.org> wrote:
> 
>> On 6/7/20 7:46 AM, Richard Wordingham via Unicode wrote:
>> 
>> I agree.  Sorry, pretty typography is nice and everything, but if bent LAMED is anything, it's at
>> best a presentation form (and even that is a hard sell.)  You show ANYONE a word spelled with any
>> combination of bent and straight LAMEDs and ask how it's spelled, they'll just say "LAMED" for each
>> one.  Unicode encodes different *characters*, symbols that have a different *meaning* in text, not
>> things that happen to look different.  A U+05BA HOLAM HASER FOR VAV means not just "a dot like
>> U+05B9 only shifted over a little," it means that there is something *different* going on: VAV plus
>> HOLAM usually means one thing (a VAV as mater lectionis for an /o/ vowel), this is a consonantal
>> VAV followed by a vowel.  In spelling it out, you could call one a holam malé, but not the other. 
>> A QAMATS QATAN is not just a qamats that looks a little different, it is a grammatically distinct
>> character, and moreover one that cannot be deduced algorithmically by looking at the letters around
>> it.  What you're talking about is a LAMED and a LAMED.  They are two *glyphs* for the same
>> character, and Unicode doesn't encode glyphs (anymore?)
>> 
>> ~mark
> 




More information about the Unicode mailing list