Why do the Hebrew Alphabetic Presentation Forms Exist

Wáng Yifán 747.neutron at gmail.com
Mon Jun 8 02:23:41 CDT 2020


As CJK ideographs mentioned...

They are different from most other Unicode code points in several ways, namely:
- Most of the substantial discussion is going on under the supervision
of ISO, rather than UTC. It's one of few fields whose description in
ISO/IEC 10646 is more informative than that in the Unicode Standard.
For practical knowledge see especially the ISO standard's Annex P and
S.
- Whether to separately encode two characters is mainly decided by
difference in structure i.e. sub-character formation, besides the
semantics, because Han characters are compositional by nature, unlike
most phonetic scripts where each letter only means what it means as a
whole shape (Α is not a Λ with hyphen, is it?).
- The questionable quality of CJK Extension B characters is an open secret.

2020年6月8日(月) 4:57 Abraham Gross via Unicode <unicode at unicode.org>:

>
> If this is the case, then why do the CJK blocks have tons of alternatives for the same character? (not counting the compatibility ideographs that were just added for compatibility with other encodings) If you look at old dictionaries, these alternatives get listed as alternatives of the same character you might see some fonts use. The meaning is exactly the same.
>
> Some examples (theres tons and tons more):
> 也𠃟𦫴𦬀𠔄𠃒
> 足𠯁𠯣
> 之㞢𠔇𡳿
> 是昰
> 乎𠂞𠂠
> 事亊𠭆
> 氣気气
> 中𠁦𠁧𠁩𠔈𠔗
> 典𠔓
> 教敎斅𢽾𧧿𤕝
>
>
> 2020年6月7日 10:27, "Mark E. Shoulson via Unicode" <unicode at unicode.org> wrote:
>
> > On 6/7/20 7:46 AM, Richard Wordingham via Unicode wrote:
> >
> >> On Sat, 6 Jun 2020 23:58:42 -0600
> >> Anshuman Pandey via Unicode <unicode at unicode.org> wrote:
> >>
> >>> Hi Abraham,
> >>>
> >>> If you’re seriously thinking of submitting a proposal for a new
> >>> Hebrew character, please consider getting in touch with Deborah
> >>> Anderson, Michael Everson, or me. We’d be happy to help you figure
> >>> out the suitability of encoding the character in question or figuring
> >>> out ways to represent it in plain text, if need be.
> >>
> >> I[t] doesn't belong in plain text. It only becomes useful once line
> >> breaks and character spacing are known.
> >>
> >> Richard.
> >
> > I agree.  Sorry, pretty typography is nice and everything, but if bent LAMED is anything, it's at
> > best a presentation form (and even that is a hard sell.)  You show ANYONE a word spelled with any
> > combination of bent and straight LAMEDs and ask how it's spelled, they'll just say "LAMED" for each
> > one.  Unicode encodes different *characters*, symbols that have a different *meaning* in text, not
> > things that happen to look different.  A U+05BA HOLAM HASER FOR VAV means not just "a dot like
> > U+05B9 only shifted over a little," it means that there is something *different* going on: VAV plus
> > HOLAM usually means one thing (a VAV as mater lectionis for an /o/ vowel), this is a consonantal
> > VAV followed by a vowel.  In spelling it out, you could call one a holam malé, but not the other.
> > A QAMATS QATAN is not just a qamats that looks a little different, it is a grammatically distinct
> > character, and moreover one that cannot be deduced algorithmically by looking at the letters around
> > it.  What you're talking about is a LAMED and a LAMED.  They are two *glyphs* for the same
> > character, and Unicode doesn't encode glyphs (anymore?)
> >
> > ~mark
>



More information about the Unicode mailing list