Bidi reordering of soft hyphen

Asmus Freytag asmusf at ix.netcom.com
Tue Apr 1 16:43:38 CDT 2014


I think this calls for an implementation note on UAX#9 along these lines.
-------------------------
During line breaking, if a line is broken at the location of a SHY, the 
text around the line break may change. A common case is the replacement 
of the invisible SHY by a visible HYPHEN, but see Section x.x in the 
Unicode Standard.

For the purposes of the Bidi Algorithm, apply steps .. to .. after any 
substitutions have been made, using the directional classes for the 
substituted characters, instead of a single BN for the SHY character.

<example>

Note, no special action need be taken for a SHY character in the middle 
of a line, unless they are rendered as visible glyphs in a "show hidden 
character" mode. In the latter case, the recommendation would be to 
treat the visible symbol substituted for the SHY as having bidi class ON.
------------------------

I am not sure whether -car CBA or car- CBA is the right answer, nor 
whether the substitution will always be limited to the preceding line. 
(Old orthography German had Bäc<SHY>ker turning in to Bäk-|ker, where 
I've used | to show the line ending.) Those are details that the UBA 
should be ignorant about. The important thing is that the array of bidi 
directional classes is not constrained to contain a single entry for BN 
at the location of the original SHY.

If "car- CBA" is the right answer then the substitution would have to be 
HYPHEN plus LRM to get this to come out right, but that would be under 
the control of the line-breaking conventions, and not legislated by the UBA.

A./

On 4/1/2014 1:31 PM, Whistler, Ken wrote:
>
> Richard Wordingham noted:
>
> > As U+2010 HYPHEN would result in text like 'car-', in an English
>
> > influenced context I would also go with 'car-'.
>
> That's always a possibility, I suppose, but I'm not sure what
>
> "English influenced context" means here.
>
> The examples I just gave were for a RTL paragraph context.
>
> In a LTR paragraph context, the same input would end up in
>
> a very different order:
>
> Trace: Entering br_UBA_ReverseLevels [L2]
>
> Current State: 19
>
>   Text:        05D0 05D1 05D2 0020 0063 0061 0072 002D
>
>   Bidi_Class:     R    R    R    L L    L    L    L
>
>   Levels:         1    1    1    0 0    0    0    0
>
>   Runs: <L-----------------------------------L>
>
>   Order:      [2 1 0 3 4 5 6 7]
>
> And you get the display:
>
> CBA car-
>
> --------->
>
> As opposed to:
>
> -car CBA
>
> <---------
>
> In either case, the hyphen-minus (or hyphen), ends up at the *end of 
> the line*.
>
> My take is that *if* I am going to insert a visible glyph at the point 
> of the
>
> SHY, it would probably be best to insert it at the actual line break 
> at the
>
> end of the line, to be in the same position as an explicit 
> hyphen-minus with
>
> the same line break.
>
> --Ken
>
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140401/6df42d22/attachment.html>


More information about the Unicode mailing list