Character Sequences of Uncertain Rendering (was: Version linking?)

Sat Aug 26 23:06:04 CDT 2017

On Sat, 26 Aug 2017 21:52:19 +0200
Philippe Verdy via Unicode <unicode at unicode.org> wrote:

> 2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode <
> unicode at unicode.org>:  

> Of course SHY in this use is not suitable, but who knows if one will
> not need this to split in tow parts what would be otherwise a single
> cluster (possibly reordered by canonical reordering if one needs to
> split between two Indic matras: this would suggest there's a need for
> a new "empty base consonnant" for that Indic script, but SHY (U+00AD)
> should probably not have the correct effect if it also inserts an
> undesired line break opportunity, independantly of how the glyph
> which would be rendered and the position (first or second line) where
> it would be rendered if the linebreak is honored).

I am confused as to what conceivable case you have in mind.  An example
would help.  I wonder if I'm misunderstanding what you mean by
'canonical reordering'.  Do you mean the order of codepoints, or the
arrangement of glyphs.  CGJ is available to preserve a specific
ordering of codepoints, though it is completely redundant in most Indic
scripts.

It is a fact that aksharas do get split between lines in manuscripts,
undesirable though it may be.  In a transcription intended to preserve
a division into lines, one would probably use NBSP at such a point,
and worry less about attempting to preserve the structure of the
line-broken akshara.  It seems that Unicode only supports word
boundaries and their absence where they provide or prohibit line
breaks.

Richard.