Recent Hebrew Proposals

Sat Oct 25 16:43:47 CDT 2025

If QAMATS and SHEVA are to be split as suggested, I propose that the split variants be completely separate, i.e. new symbols for QAMATS QATAN and QAMATS GADOL, leaving the existing QAMATS as is for the undifferentiated QAMATS. This is necessary because there are cases of disagreement whether the QAMATS is QATAN or GADOL, for example in the name נָעֳמִי (Naomi). See https://hebrew-academy.org.il/%d7%a6%d6%b8%d7%94%d6%b3%d7%a8%d6%b7%d7%99%d6%b4%d7%9d-%d7%a0%d6%b8%d7%a2%d6%b3%d7%9e%d6%b4%d7%99-%d7%94%d7%92%d7%99%d7%99%d7%aa-%d7%a7%d7%9e%d7%a5-%d7%9c%d7%a4%d7%a0%d7%99-%d7%97%d7%98/

The situation with SHEVA is similar in that there are cases where there is disagreement, for example in words like כִּתְבִי. 

Best Regards,

Jonathan Rosenne

-----Original Message-----
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of Mark E. Shoulson via Unicode
Sent: Friday, October 24, 2025 7:55 PM
To: unicode at corp.unicode.org
Subject: Recent Hebrew Proposals

I wasn't watching the document registry carefully enough; Hebrew proposals are often things I feel I can help with.  Let's see if I can avoid weighing in just for the sake of talking.  I have my doubts.

WRT the sheva na/heavy sheva, there were indeed already *some* imprints that made that distinction back when we proposed QAMATS QATAN, but they were quite few.  There are more printers who want to make this distinction now, maybe typographic style has changed, and perhaps shva na does need encoding now.  The same for dagesh hazaq.  I have a scan (attached, if the list permits) from way back when from a source (Koren) that didn't and doesn't distinguish dagesh qal from dagesh hazaq... but nonetheless has a subtle but distinct difference between a VAV with a dagesh and a VAV with a shuruq-dot (see at the end of the second word from the left, two vavs, but the dot in the first one is just a bit higher than the second?  That's on purpose).

With regard to the "double duty" that these characters would involve, well, there is something to that.  It is indeed much the same as when I proposed QAMATS QATAN and HOLAM HASER FOR VAV: there is a long tradition of NOT distinguishing these symbols, they were long considered the same symbol (even if they had different semantic meanings), and many (most?) printers will carry on not distinguishing them... but we want to support a growing segment of publishers that are making the distinction.  That's sort of the situation we're in, and I suppose the most completely unambiguous approach would be to leave, say HEBREW POINT SHEVA for the lumpers and encode *both* HEBREW POINT SHEVA MOBILE and also HEBREW POINT QUIESCENT SHEVA for the splitters.  But I think most here would agree that that would be excessive, and since it's the mobile sheva and the heavy dagesh that are being given new emphasized shapes for the most part, it makes sense to split them off and leave the rest undistinguished.

Document L2/25-237 draws a distinction between QAMATS QATAN and the case of ATNAH HAFUKH, and there is an important difference.  A distinct shape for QAMATS QATAN is a recent innovation; it was never part of classical Hebrew orthography but has been introduced within the past century(?) and gained traction, sufficient to be worth considering.  ATNAH HAFUKH actually has the opposite problem.  As we showed in the proposal for ATNAH HAFUKH, it formerly *was* written distinctly from YERAH BEN YOMO in old MSS, including the Aleppo Codex, and only later (probably with the advent of printing) were the two symbols conflated.  So even if it did not become common in current Hebrew printing to show them distinctly, it would still have been a good idea to disunify them in order to transcribe such MSS accurately.

Regarding L2/25-242, proposing "helper" accents for preposed/postposed accents, I am opposed.  These "helpers" were never considered "different symbols" from the real ones, but only copies placed more conveniently to help the reader.  In fact, I would say that for Zarqa, after using 
U+05AE HEBREW ACCENT ZINOR for the "main" postposed accent, one should
NOT use U+U0598 HEBREW ACCENT ZARQA for the "helper" even though it has the right look and positioning (the names of these accents are a known anomaly, see https://www.unicode.org/notes/tn27/ appendix A).  Rather, one should use U+05AE HEBREW ACCENT ZINOR for both of them, and the font should know to position the non-final one differently.  Same for PASHTA; in my opinion, one should use U+0599 HEBREW ACCENT PASHTA for both the main and helper, and not use U+05A8 HEBREW ACCENT QADMA.  After all, both symbols are pashtas!  Just one is written in the wrong place to help you out.

Using the font to reposition things might be "fragile", but that doesn't make it wrong.  That kind of positioning really is the font's (and
font-renderer's) job to keep straight, not the encoding.  And BTW, I don't think I've ever seen a "helper DEHI" anywhere, so that one is a solution in search of a problem.  I know that the MCE used different codings for those "helpers" (I regularly use MCE, still reading Hebrew texts encoded in plain ASCII letters, though I wouldn't recommend it to anyone); I'm not sure that argues much one way or the other.  MCE also, I think, encoded preposed accents *before* their letters, which is definitely contrary to Unicode's principles, as well as coding VAV + HOLAM as HOLAM + VAV.  I don't really see that this separate encoding really helps much.

I remember years ago someone was asking to encode the "MEAYLA" or "MAYELA" accent, on the grounds that it is considered a distinct cantillation by scholars, even though it is identical in appearance and placement to TIPEHA and can only be distinguished by the fact that it appears in the same (possibly hyphenated) word as a "siluq" 
(end-of-verse) or ETNAHTA.  (For that matter, the unification of "siluq" 
with METEG is a far, far nastier problem to deal with, were it not for the fact that you can tell the end-of-verse by the following SOF PASUQ).  But there was never any distinction between meayla and tipeha except for scholarly debate (the meayla even has the same effect on the sequence of cantillations that the tipeha has, even though it's technically a connective and not a disjunctive.)

And indeed, I believe the same person also proposed disunifying PASEQ from the line used to make a legarmehh (or shalshelet gedola).  (I remember she once asked if there were people making Unicode decisions who were NOT font designers, as if the problem was that we were all mere grunts making fonts and not students of Hebrew.)  And again, there was really no reason: nobody (almost?) ever made that distinction in writing, and Unicode is here to encode things that are *written*, not things that we think about.

Now, perhaps the situation is different.  Maybe the paseq vs legarmehh line is starting to be recognized by printers, as in the examples shown.  Is it widespread enough to really matter?  That's another question.  I think the previous suggestion (not an actual proposal) was for there to be a "LEGARMEHH LINE" codepoint as distinct from PASEQ, and I guess the PASEQ line would do "double duty"; this proposal is the other way around, coding a "PASEQ NOT LEGARMEHH" point.  That feels more complicated and harder to understand, but makes sense numerically, since there are more LEGARMEHHs than PASEQs.  (The problem is that the symbol has always been commonly referred to as PASEQ, with people saying "and then a legarmehh has a line after it that looks like a PASEQ...") Again, this is a "qamats qatan" type problem, not an "atnah hafukh" type problem ("No manuscript distinguishes paseq from legarmeh in the way that some recent publications do", from the proposal).  And lest you think I am insensitive to the situation, I have indeed years ago written a program for parsing Biblical verses according to cantillations that runs up against this exact problem.  (I just haven't bothered to address the issue seriously.)

~mark