Why do the Hebrew Alphabetic Presentation Forms Exist

Doug Ewell doug at ewellic.org
Tue Jun 9 17:33:20 CDT 2020


abrahamgross at disroot.org wrote:

> Unicode encodes characters that other character sets have even though
> it normally wouldn't. So if I find a character set with a folded lamed
> they'd add it?

To elaborate a little on John's comment that "that's really not done anymore": Unicode more or less promised to encode everything that was present in existing, contemporary coded character sets. So if it was in ISO 8859-8, MS-DOS CP862, Windows CP1255, MARC-8 for Hebrew, etc., then it would be in Unicode as well. That's where the presentation forms came from, as mentioned earlier.

This did not mean Unicode was obligated to conform retroactively to every coded character set introduced or updated *after* Unicode was published. It has certainly done so for some widely used character sets, particularly in East Asia, but there is no obligation for Unicode to add EWELLIC LETTER A just because I publish an 8-bit character set that contains that letter.

And this promise always applied to "coded character sets," a collection of mappings between a code point (single-byte, double-byte, or multi-byte) and a character, used to represent plain text in computers. It didn't apply to glyph collections for typesetting, as in the TeX example below, and definitely not to charts of letters found in a book, with no corresponding code points, as in the JPEG image below.

> Here are 2 character sets with a folded lamed:
> https://i.imgur.com/iq8awBe.jpg – an אלף בינה with the standing and
> folded lameds as separate letters.
> https://www.tug.org/TUGboat/tb15-3/tb44haralambous-hebrew.pdf#page=12
> – A TeX typesetting module with the standing and folded lameds as
> separate characters for fine-grain control when the automatic system
> doesn't work.

--
Doug Ewell | Thornton, CO, US | ewellic.org





More information about the Unicode mailing list