Recent Hebrew Proposals
Mark E. Shoulson
mark at kli.org
Fri Oct 24 11:54:40 CDT 2025
I wasn't watching the document registry carefully enough; Hebrew
proposals are often things I feel I can help with. Let's see if I can
avoid weighing in just for the sake of talking. I have my doubts.
WRT the sheva na/heavy sheva, there were indeed already *some* imprints
that made that distinction back when we proposed QAMATS QATAN, but they
were quite few. There are more printers who want to make this
distinction now, maybe typographic style has changed, and perhaps shva
na does need encoding now. The same for dagesh hazaq. I have a scan
(attached, if the list permits) from way back when from a source (Koren)
that didn't and doesn't distinguish dagesh qal from dagesh hazaq... but
nonetheless has a subtle but distinct difference between a VAV with a
dagesh and a VAV with a shuruq-dot (see at the end of the second word
from the left, two vavs, but the dot in the first one is just a bit
higher than the second? That's on purpose).
With regard to the "double duty" that these characters would involve,
well, there is something to that. It is indeed much the same as when I
proposed QAMATS QATAN and HOLAM HASER FOR VAV: there is a long tradition
of NOT distinguishing these symbols, they were long considered the same
symbol (even if they had different semantic meanings), and many (most?)
printers will carry on not distinguishing them... but we want to support
a growing segment of publishers that are making the distinction. That's
sort of the situation we're in, and I suppose the most completely
unambiguous approach would be to leave, say HEBREW POINT SHEVA for the
lumpers and encode *both* HEBREW POINT SHEVA MOBILE and also HEBREW
POINT QUIESCENT SHEVA for the splitters. But I think most here would
agree that that would be excessive, and since it's the mobile sheva and
the heavy dagesh that are being given new emphasized shapes for the most
part, it makes sense to split them off and leave the rest undistinguished.
Document L2/25-237 draws a distinction between QAMATS QATAN and the case
of ATNAH HAFUKH, and there is an important difference. A distinct shape
for QAMATS QATAN is a recent innovation; it was never part of classical
Hebrew orthography but has been introduced within the past century(?)
and gained traction, sufficient to be worth considering. ATNAH HAFUKH
actually has the opposite problem. As we showed in the proposal for
ATNAH HAFUKH, it formerly *was* written distinctly from YERAH BEN YOMO
in old MSS, including the Aleppo Codex, and only later (probably with
the advent of printing) were the two symbols conflated. So even if it
did not become common in current Hebrew printing to show them
distinctly, it would still have been a good idea to disunify them in
order to transcribe such MSS accurately.
Regarding L2/25-242, proposing "helper" accents for preposed/postposed
accents, I am opposed. These "helpers" were never considered "different
symbols" from the real ones, but only copies placed more conveniently to
help the reader. In fact, I would say that for Zarqa, after using
U+05AE HEBREW ACCENT ZINOR for the "main" postposed accent, one should
NOT use U+U0598 HEBREW ACCENT ZARQA for the "helper" even though it has
the right look and positioning (the names of these accents are a known
anomaly, see https://www.unicode.org/notes/tn27/ appendix A). Rather,
one should use U+05AE HEBREW ACCENT ZINOR for both of them, and the font
should know to position the non-final one differently. Same for PASHTA;
in my opinion, one should use U+0599 HEBREW ACCENT PASHTA for both the
main and helper, and not use U+05A8 HEBREW ACCENT QADMA. After all,
both symbols are pashtas! Just one is written in the wrong place to
help you out.
Using the font to reposition things might be "fragile", but that doesn't
make it wrong. That kind of positioning really is the font's (and
font-renderer's) job to keep straight, not the encoding. And BTW, I
don't think I've ever seen a "helper DEHI" anywhere, so that one is a
solution in search of a problem. I know that the MCE used different
codings for those "helpers" (I regularly use MCE, still reading Hebrew
texts encoded in plain ASCII letters, though I wouldn't recommend it to
anyone); I'm not sure that argues much one way or the other. MCE also,
I think, encoded preposed accents *before* their letters, which is
definitely contrary to Unicode's principles, as well as coding VAV +
HOLAM as HOLAM + VAV. I don't really see that this separate encoding
really helps much.
I remember years ago someone was asking to encode the "MEAYLA" or
"MAYELA" accent, on the grounds that it is considered a distinct
cantillation by scholars, even though it is identical in appearance and
placement to TIPEHA and can only be distinguished by the fact that it
appears in the same (possibly hyphenated) word as a "siluq"
(end-of-verse) or ETNAHTA. (For that matter, the unification of "siluq"
with METEG is a far, far nastier problem to deal with, were it not for
the fact that you can tell the end-of-verse by the following SOF
PASUQ). But there was never any distinction between meayla and tipeha
except for scholarly debate (the meayla even has the same effect on the
sequence of cantillations that the tipeha has, even though it's
technically a connective and not a disjunctive.)
And indeed, I believe the same person also proposed disunifying PASEQ
from the line used to make a legarmehh (or shalshelet gedola). (I
remember she once asked if there were people making Unicode decisions
who were NOT font designers, as if the problem was that we were all mere
grunts making fonts and not students of Hebrew.) And again, there was
really no reason: nobody (almost?) ever made that distinction in
writing, and Unicode is here to encode things that are *written*, not
things that we think about.
Now, perhaps the situation is different. Maybe the paseq vs legarmehh
line is starting to be recognized by printers, as in the examples
shown. Is it widespread enough to really matter? That's another
question. I think the previous suggestion (not an actual proposal) was
for there to be a "LEGARMEHH LINE" codepoint as distinct from PASEQ, and
I guess the PASEQ line would do "double duty"; this proposal is the
other way around, coding a "PASEQ NOT LEGARMEHH" point. That feels more
complicated and harder to understand, but makes sense numerically, since
there are more LEGARMEHHs than PASEQs. (The problem is that the symbol
has always been commonly referred to as PASEQ, with people saying "and
then a legarmehh has a line after it that looks like a PASEQ...") Again,
this is a "qamats qatan" type problem, not an "atnah hafukh" type
problem ("No manuscript distinguishes paseq from legarmeh in the way
that some recent publications do", from the proposal). And lest you
think I am insensitive to the situation, I have indeed years ago written
a program for parsing Biblical verses according to cantillations that
runs up against this exact problem. (I just haven't bothered to address
the issue seriously.)
~mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Nitztavvu_small.png
Type: image/png
Size: 18829 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251024/09a3ddca/attachment.png>
More information about the Unicode
mailing list