Zero-Width Joiner U+200D

Asmus Freytag asmusf at ix.netcom.com
Tue Mar 14 20:57:45 CDT 2023


The language that gave rise to the original confusion

“U+200D zero width joiner is intended to produce a more connected 
rendering of adjacent characters than would otherwise be the case, if 
possible...."

is not in contradiction (as originally claimed), because "adjacent" 
clearly refers to characters "adjacent" to the ZWJ, not "character that 
would be adjacent across a ZWJ". There's nothing in the language that 
supports that (mis-)reading. However, simply changing the language to

“U+200D zero width joiner is intended to produce a more connected 
rendering of characters *adjacent to it* than would otherwise be the 
case, if possible...

would prevent that reading. Yet with that change, the sentence becomes 
completely impossible to scan.

The problem stems partially from the desire to make the text on ZWJ and 
ZWNJ appear (anti-)symmetric. However, this ignores the fact that they 
behave very differently when placed near spaces and start/end of line or 
text.

I would suggest a slight change:

Joiner.U+200Dzero width joiner/*requests*//**/a more connected rendering
of adjacent characters than would otherwise be the case.

where "requests" replaced the curious "intends to produce". And we can 
delete the "if possible" because if not possible, its only a request and 
no request can be satisfied in situations where that is not possible. 
The remaining text below the bullets already covers that case, should 
there be any doubts.

However, I would suggest we add a bullet:

* A typical use of ZWJ is to show the connected form of a character 
without a visible neighbor.



On 3/14/2023 5:30 PM, Peter Constable via Unicode wrote:
>
> From Unicode 15, section 9.2 (p. 375):
>
> “The Non-joiner and the Joiner. The Unicode Standard provides two 
> user-selectable for[1]matting codes: U+200C zero width non-joiner and 
> U+200D zero width joiner. The use of a joiner adjacent to a suitable 
> letter permits that letter to form a cursive connection without a 
> visible neighbor. This provides a simple way to encode some special 
> cases, such as exhibiting a connecting form in isolation, as shown in 
> Figure 9-2.”
>
> Later in that section (p. 383), ZWJ is listed in the Join_Causing set 
> of Arabic joining types
>
> It seems to me the text is describing the original intent as Asmus 
> described.
>
> Peter
>
> *From:* Unicode <unicode-bounces at corp.unicode.org> *On Behalf Of 
> *Jukka K. Korpela via Unicode
> *Sent:* Tuesday, February 21, 2023 4:56 AM
> *To:* Asmus Freytag <asmusf at ix.netcom.com>
> *Cc:* unicode at corp.unicode.org
> *Subject:* Re: Zero-Width Joiner U+200D
>
>  Asmus Freytag via Unicode (unicode at corp.unicode.org) wrote:
>
>     I think we need to look at whether the language accurately
>     reflects what we were trying to say. I do know that it was revised
>     at one point, when the use of ZWJ was generalized beyond cursive
>     connection.
>
> It seems that this took place as early as in Unicode 2.
>
>     The interpretation you suggest may be an inadvertent result of
>     that change, or someone had found out why the usage that I always
>     understood as intended is for some reason problematic. In that
>     case, it should be excluded more explicitly, in my view.
>
> In fact, reading chapter 23 onwards, I now see the use of ZWJ’s around 
> a character to ask for isolated form. It was just so far from the 
> place that described ZWJ and ZWNJ between adjacent characters, giving 
> the impression that this is their only use. Perhaps it would help to 
> remove the word “adjacent” from “U+200D zero width joiner is intended 
> to produce a more connected rendering of adjacent characters than 
> would otherwise be the case, if possible.
>
> The text describes the use of ZWJ for isolated form and shows this in 
> example 23-1. Sorry for the confusion I caused.
>
> So the answer to Andreas’ question is “yes, it should”, with the value 
> of “should” roughly as “is intended to, according to the Unicode 
> standard, but a program that renders Unicode characters is not 
> required to obey, or even understand, such rendering suggestions”
>
> Jukka
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230314/0bc4026a/attachment.htm>


More information about the Unicode mailing list