Unicode String Models

Thu Sep 13 00:08:19 CDT 2018

On Wed, Sep 12, 2018 at 11:37 AM Hans Åberg via Unicode
<unicode at unicode.org> wrote:
> The idea is to extend Unicode itself, so that those bytes can be represented by legal codepoints.

Extending Unicode itself would likely create more problems that it
would solve. Extending the value space of Unicode scalar values would
be extremely disruptive for systems whose design is deeply committed
to the current definitions of UTF-16 and UTF-8 staying unchanged.
Assigning a scalar value within the current Unicode scalar value space
to currently malformed bytes would have the problem of those scalar
values losing information whether they came from malformed bytes or
the well-formed encoding of those scalar values.

It seems better to let applications that have use cases that involve
representing non-Unicode values to use a special-purpose extension on
their own.

-- 
Henri Sivonen
hsivonen at hsivonen.fi
https://hsivonen.fi/