Recent Hebrew Proposals

Fri Oct 24 11:54:40 CDT 2025

I wasn't watching the document registry carefully enough; Hebrew 
proposals are often things I feel I can help with.  Let's see if I can 
avoid weighing in just for the sake of talking.  I have my doubts.

WRT the sheva na/heavy sheva, there were indeed already *some* imprints 
that made that distinction back when we proposed QAMATS QATAN, but they 
were quite few.  There are more printers who want to make this 
distinction now, maybe typographic style has changed, and perhaps shva 
na does need encoding now.  The same for dagesh hazaq.  I have a scan 
(attached, if the list permits) from way back when from a source (Koren) 
that didn't and doesn't distinguish dagesh qal from dagesh hazaq... but 
nonetheless has a subtle but distinct difference between a VAV with a 
dagesh and a VAV with a shuruq-dot (see at the end of the second word 
from the left, two vavs, but the dot in the first one is just a bit 
higher than the second?  That's on purpose).

With regard to the "double duty" that these characters would involve, 
well, there is something to that.  It is indeed much the same as when I 
proposed QAMATS QATAN and HOLAM HASER FOR VAV: there is a long tradition 
of NOT distinguishing these symbols, they were long considered the same 
symbol (even if they had different semantic meanings), and many (most?) 
printers will carry on not distinguishing them... but we want to support 
a growing segment of publishers that are making the distinction.  That's 
sort of the situation we're in, and I suppose the most completely 
unambiguous approach would be to leave, say HEBREW POINT SHEVA for the 
lumpers and encode *both* HEBREW POINT SHEVA MOBILE and also HEBREW 
POINT QUIESCENT SHEVA for the splitters.  But I think most here would 
agree that that would be excessive, and since it's the mobile sheva and 
the heavy dagesh that are being given new emphasized shapes for the most 
part, it makes sense to split them off and leave the rest undistinguished.

Document L2/25-237 draws a distinction between QAMATS QATAN and the case 
of ATNAH HAFUKH, and there is an important difference.  A distinct shape 
for QAMATS QATAN is a recent innovation; it was never part of classical 
Hebrew orthography but has been introduced within the past century(?) 
and gained traction, sufficient to be worth considering.  ATNAH HAFUKH 
actually has the opposite problem.  As we showed in the proposal for 
ATNAH HAFUKH, it formerly *was* written distinctly from YERAH BEN YOMO 
in old MSS, including the Aleppo Codex, and only later (probably with 
the advent of printing) were the two symbols conflated.  So even if it 
did not become common in current Hebrew printing to show them 
distinctly, it would still have been a good idea to disunify them in 
order to transcribe such MSS accurately.

Regarding L2/25-242, proposing "helper" accents for preposed/postposed 
accents, I am opposed.  These "helpers" were never considered "different 
symbols" from the real ones, but only copies placed more conveniently to 
help the reader.  In fact, I would say that for Zarqa, after using 
U+05AE HEBREW ACCENT ZINOR for the "main" postposed accent, one should 
NOT use U+U0598 HEBREW ACCENT ZARQA for the "helper" even though it has 
the right look and positioning (the names of these accents are a known 
anomaly, see https://www.unicode.org/notes/tn27/ appendix A).  Rather, 
one should use U+05AE HEBREW ACCENT ZINOR for both of them, and the font 
should know to position the non-final one differently.  Same for PASHTA; 
in my opinion, one should use U+0599 HEBREW ACCENT PASHTA for both the 
main and helper, and not use U+05A8 HEBREW ACCENT QADMA.  After all, 
both symbols are pashtas!  Just one is written in the wrong place to 
help you out.

Using the font to reposition things might be "fragile", but that doesn't 
make it wrong.  That kind of positioning really is the font's (and 
font-renderer's) job to keep straight, not the encoding.  And BTW, I 
don't think I've ever seen a "helper DEHI" anywhere, so that one is a 
solution in search of a problem.  I know that the MCE used different 
codings for those "helpers" (I regularly use MCE, still reading Hebrew 
texts encoded in plain ASCII letters, though I wouldn't recommend it to 
anyone); I'm not sure that argues much one way or the other.  MCE also, 
I think, encoded preposed accents *before* their letters, which is 
definitely contrary to Unicode's principles, as well as coding VAV + 
HOLAM as HOLAM + VAV.  I don't really see that this separate encoding 
really helps much.

I remember years ago someone was asking to encode the "MEAYLA" or 
"MAYELA" accent, on the grounds that it is considered a distinct 
cantillation by scholars, even though it is identical in appearance and 
placement to TIPEHA and can only be distinguished by the fact that it 
appears in the same (possibly hyphenated) word as a "siluq" 
(end-of-verse) or ETNAHTA.  (For that matter, the unification of "siluq" 
with METEG is a far, far nastier problem to deal with, were it not for 
the fact that you can tell the end-of-verse by the following SOF 
PASUQ).  But there was never any distinction between meayla and tipeha 
except for scholarly debate (the meayla even has the same effect on the 
sequence of cantillations that the tipeha has, even though it's 
technically a connective and not a disjunctive.)

And indeed, I believe the same person also proposed disunifying PASEQ 
from the line used to make a legarmehh (or shalshelet gedola).  (I 
remember she once asked if there were people making Unicode decisions 
who were NOT font designers, as if the problem was that we were all mere 
grunts making fonts and not students of Hebrew.)  And again, there was 
really no reason: nobody (almost?) ever made that distinction in 
writing, and Unicode is here to encode things that are *written*, not 
things that we think about.

Now, perhaps the situation is different.  Maybe the paseq vs legarmehh 
line is starting to be recognized by printers, as in the examples 
shown.  Is it widespread enough to really matter?  That's another 
question.  I think the previous suggestion (not an actual proposal) was 
for there to be a "LEGARMEHH LINE" codepoint as distinct from PASEQ, and 
I guess the PASEQ line would do "double duty"; this proposal is the 
other way around, coding a "PASEQ NOT LEGARMEHH" point.  That feels more 
complicated and harder to understand, but makes sense numerically, since 
there are more LEGARMEHHs than PASEQs.  (The problem is that the symbol 
has always been commonly referred to as PASEQ, with people saying "and 
then a legarmehh has a line after it that looks like a PASEQ...") Again, 
this is a "qamats qatan" type problem, not an "atnah hafukh" type 
problem ("No manuscript distinguishes paseq from legarmeh in the way 
that some recent publications do", from the proposal).  And lest you 
think I am insensitive to the situation, I have indeed years ago written 
a program for parsing Biblical verses according to cantillations that 
runs up against this exact problem.  (I just haven't bothered to address 
the issue seriously.)

~mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Nitztavvu_small.png
Type: image/png
Size: 18829 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251024/09a3ddca/attachment.png>