UAX #9: applicability of higher-level protocols to bidi plaintext

Shai Berger via Unicode unicode at unicode.org
Wed Jul 18 03:51:36 CDT 2018


On Mon, 16 Jul 2018 17:40:50 -0700
Ken Whistler via Unicode <unicode at unicode.org> wrote:
> 
> So your complaint seems to boil down to the claim that if you
> transmit "Hello, world!" to a process which then renders it 
> conformantly according to the Unicode Standard (including
> UBA), then that process must somehow know *and honor* 
> your intent that it display in a LTR directional context. That
> information, however, is explicitly *not* contained in
> the plain text string there, and has to be conveyed by means of a 
> higher-level protocol.
> (E.g. HTML markup as dir="ltr", etc.)
> 
I believe this is an inaccurate description, but indeed the
discrepancy is at the root of the issue here.

The UBA defines a default algorithm for determining the directionality
of plain text paragraphs. My claim is that in the absence of an agreed
or conveyed higher-level protocol, this default must be respected.

> If the receiving process, by whatever means, has raised its hand and 
> says, effectively, "I assume a RTL context for all text display",
> that is its right. You  can't complain if it displays your "Hello,
> world!" as shown above. Well, you *can* complain, but you wouldn't be
> correct. Basically, you and the receiving process do not  share the
> same assumptions about the higher-level protocol involved which
> specifies paragraph direction.
> 

This, essentially, boils down to a claim that the default is not really
a default, but itself must be the subject of agreement between sides.
My view is that expressed by FAQ #bidi7 -- a higher-level protocol is
an agreement. It can be explicit (e.g. HTML) or implicit (e.g. the
convention that log files are to be read LTR), but it cannot be
applied in a void, or else interoperability is lost.

> OR, you are just unhappy about the bidirectional
> rendering conundrums
> of some edge cases for the UBA.

I wish they were -- while the "Hello, World!" example is a bit of a
contrition, the "SESU RETHO DNA email ROF plaintext REFERP I"
example is quite cental to the UBA, and represents an extremely common
case; Hebrew paragraphs with embedded English words are at least
whole percents of all paragraphs written in Hebrew about technology, for
example.

On Mon, 16 Jul 2018 21:51:32 -0700
Asmus Freytag via Unicode <unicode at unicode.org> wrote:

> [The Unicode Standard's] conformance clause is written to allow
> implementations to solve real-world issues without becoming formally
> non-conformant.

I accept that this was the intention; I claim that, as things are
currently written, they cause more real-world issues than they solve.

The only example given here of a real-world issue served by abolishing
the UBA defaults is performance degradation on some special files --
which are just as easy to treat specially, as Eli described in the case
of Emacs and logs. One other consideration raised boils down to, "it's
better to make some texts completely unreadable, then to present some
other texts readably, but with the wrong alignment".

The trade-off you seem to prefer is to make the "plain text
is universally readable" idea from the core Unicode definition, not
applicable to BiDi text.

Why?

Thanks,
	Shai


More information about the Unicode mailing list