Bidi reordering of soft hyphen

Mark Davis ☕️ mark at macchiato.com
Wed Apr 2 01:27:23 CDT 2014


I tend to agree with Roozbeh and Behdad. I would expect to find the visible
appearance of the hyphen "replacing" the letters that were broken off from
the last word. That is, if the word was "beekeeper", I'd expect to see:

.... bee- .....

That would be no matter where the word occurred, and no matter what the
direction of the paragraph or surrounding text. (If the SHY occurred at a
directional boundary, I'd also say we don't care much...)

In any event, once we come up with an agreed recommendation, I'd suggest an
implementation note like Asmus describes, but rather than talk about
algorithmic steps, just point out the desired visual behavior (since there
are many ways to do it).



Mark <https://google.com/+MarkDavis>

 *— Il meglio è l’inimico del bene —*


On 1 April 2014 23:43, Asmus Freytag <asmusf at ix.netcom.com> wrote:

>  I think this calls for an implementation note on UAX#9 along these lines.
> -------------------------
> During line breaking, if a line is broken at the location of a SHY, the
> text around the line break may change. A common case is the replacement of
> the invisible SHY by a visible HYPHEN, but see Section x.x in the Unicode
> Standard.
>
> For the purposes of the Bidi Algorithm, apply steps .. to .. after any
> substitutions have been made, using the directional classes for the
> substituted characters, instead of a single BN for the SHY character.
>
> <example>
>
> Note, no special action need be taken for a SHY character in the middle of
> a line, unless they are rendered as visible glyphs in a "show hidden
> character" mode. In the latter case, the recommendation would be to treat
> the visible symbol substituted for the SHY as having bidi class ON.
> ------------------------
>
> I am not sure whether -car CBA or car- CBA is the right answer, nor
> whether the substitution will always be limited to the preceding line. (Old
> orthography German had Bäc<SHY>ker turning in to Bäk-|ker, where I've used
> | to show the line ending.) Those are details that the UBA should be
> ignorant about. The important thing is that the array of bidi directional
> classes is not constrained to contain a single entry for BN at the location
> of the original SHY.
>
> If "car- CBA" is the right answer then the substitution would have to be
> HYPHEN plus LRM to get this to come out right, but that would be under the
> control of the line-breaking conventions, and not legislated by the UBA.
>
> A./
>
>
> On 4/1/2014 1:31 PM, Whistler, Ken wrote:
>
>  Richard Wordingham noted:
>
>
>
> > As U+2010 HYPHEN would result in text like 'car-', in an English
>
> > influenced context I would also go with 'car-'.
>
>
>
> That's always a possibility, I suppose, but I'm not sure what
>
> "English influenced context" means here.
>
>
>
> The examples I just gave were for a RTL paragraph context.
>
> In a LTR paragraph context, the same input would end up in
>
> a very different order:
>
>
>
> Trace: Entering br_UBA_ReverseLevels [L2]
>
> Current State: 19
>
>   Text:        05D0 05D1 05D2 0020 0063 0061 0072 002D
>
>   Bidi_Class:     R    R    R    L    L    L    L    L
>
>   Levels:         1    1    1    0    0    0    0    0
>
>   Runs:        <L-----------------------------------L>
>
>
>
>   Order:      [2 1 0 3 4 5 6 7]
>
>
>
> And you get the display:
>
>
>
> CBA car-
>
> --------->
>
>
>
> As opposed to:
>
>
>
> -car CBA
>
> <---------
>
>
>
> In either case, the hyphen-minus (or hyphen), ends up at the *end of the
> line*.
>
>
>
> My take is that *if* I am going to insert a visible glyph at the point of
> the
>
> SHY, it would probably be best to insert it at the actual line break at the
>
> end of the line, to be in the same position as an explicit hyphen-minus
> with
>
> the same line break.
>
>
>
> --Ken
>
>
>
>
>
>
> _______________________________________________
> Unicode mailing listUnicode at unicode.orghttp://unicode.org/mailman/listinfo/unicode
>
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140402/b8f569c5/attachment.html>


More information about the Unicode mailing list