ZWJ, ZWNJ and Markup languages.

Martin J. Dürst duerst at it.aoyama.ac.jp
Fri Nov 27 19:42:15 CST 2015


On 2015/11/28 04:55, Plug Gulp wrote:

> The Unicode standard 8.0 states in chapter 23, section titled "Cursive
> Connection and Ligatures"(printed page #814, PDF page #850) that:
>
> "The zero width joiner and non-joiner characters are designed for use
> in plain text; they should not be used where higher-level ligation and
> cursive control is available. (See Uni-code Technical Report #20,
> “Unicode in XML and Other Markup Languages,” for more information.) "
>
> I went through TR#20 and did not find any mention that ZWJ and ZWNJ
> are not suitable for use with markup languages. On the contrary, ZWJ
> and ZWNJ are listed in TR#20 under section 4 titled "Format Characters
> Suitable for Use with Markup".
>
> So are ZWJ and ZWNJ characters suitable for use with markup languages
> such as HTML and XML?

They are indeed suitable for use with markup languages. They are so 
suitable that they are already provided as entities in RFC 2070, which 
is now historic, and from there on through HTML 4.0 and onwards. Please 
see http://tools.ietf.org/html/rfc2070#section-4.2.

I'm not sure why Unicode 8.0 has the text it has; at the least, this 
should be toned down somewhat to say "they may be replaced by 
higher-level ligation and cursive control mechanisms if available".
Thanks for finding this!

The main reason for this is that these characters apply at a single 
point; creating markup such as <zwj/> and <zwnj/> would not give any 
advantages over ‍/‌.

Markup is at its best when it can be applied to nested spans of text. It 
is not inconcievable that something like <do_not_ligate_inside>...
</do_not_ligate_inside> could occasionally be useful, but I have 
difficulties immagining a use case of the top of my head.

I'll file a bug report with the content of this email.

Regards,   Martin.


More information about the Unicode mailing list