Why do the Hebrew Alphabetic Presentation Forms Exist

David Starner prosfilaes at gmail.com
Thu Jun 4 12:05:00 CDT 2020

On Wed, Jun 3, 2020 at 10:51 PM abrahamgross--- via Unicode
<unicode at unicode.org> wrote:
> Why do the final forms of the hebrew letters (םןץףך) exist as separate codepoints from their regular counterparts (מנצפכ), when arabic - which has up to 4 forms for each letter - only got a single codepoint per letter?

Because encoding is full of somewhat arbitrary choices. Alphabets with
a handful of variant forms, like Latin, Greek, and Hebrew, it's easier
and more expected to encode those separately, instead of complicating
systems with one exception. Keyboard entry can go directly into a
buffer with minimal massaging. Scripts like Arabic, where each letter
takes four forms, would be harder to deal with under that model; you
can't expect keyboard users to type each form separately, so either
you add a heavy input manager, or you encode each letter and let the
font deal with the different forms. (Which has its problems; I suspect
if Persian script had been encoded separately/Persian was the main
user of the Arabic script, that it would have been encoded slightly
differently, as Persian uses ZWJ and ZWNJ more frequently to force
forms. But the current encoding still works for Persian; it's just a
matter of tradeoffs.)

The standard is written in English . If you have trouble understanding
a particular section, read it again and again and again . . . Sit up
straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185

More information about the Unicode mailing list