Bidi reordering of soft hyphen

Richard Wordingham richard.wordingham at ntlworld.com
Tue Apr 1 15:10:23 CDT 2014


On Tue, 1 Apr 2014 12:51:11 +0700
James Clark <jjc at jclark.com> wrote:

> Suppose I have a paragraph (uppercase = RTL):
> 
>    CARROT IS car\u00ADrot IN ENGLISH
> 
> and the paragraph gets broken at the soft hyphen.
> 
> Is the correct ordering for the first line
> 
>   car- SI TORRAC
> 
> or
> 
>   -car SI TORRAC
> 
> ? I did not succeed in deducing the answer from UAX#9.  Soft hyphen
> has bidi class BN, which means it gets removed in stage X9, and so,
> if I have understood correctly, doesn't have a defined embedding
> level.
> 
> I'm guessing the correct ordering is the first one, but I don't trust
> my instincts here. (In particular, I wondered whether this was
> analogous to the case where rule L1 resets embedding levels so that
> trailing whitespace is at the visual end of the line.)

There is no conformance requirement on the location of the soft
hyphen.  Indeed, there is no requirement on whether it is rendered at
all (TUS Section 16.2).  As the treatment of the soft-hyphen is
language dependent even in unidirectional text, I am afraid the
treatment is down to good taste and the language(s) involved.  (E.g.,
is this Arabic text effectively embedding English text within an overall
Thai context?)

As U+2010 HYPHEN would result in text like 'car-', in an English
influenced context I would also go with 'car-'.

Richard.



More information about the Unicode mailing list