PUA (BMP) planned characters HTML tables

Wed Aug 21 12:16:51 CDT 2019

On August 11, I replied to Robert Wheelock:

>> I remember that a website that has tables for certain PUA precomposed
>> accented characters that aren’t yet in Unicode (thing like:
>> Marshallese M/m-cedilla, H/h-acute, capital T-dieresis, capital H-
>> underbar, acute accented Cyrillic vowels, Cyrillic ER/er-caron, ...).
>
> If you are thinking of these as potential future additions to the
> standard, keep in mind that accented letters that can already be
> represented by a combination of letter + accent will not ever be
> encoded. This is one of the longest-standing principles Unicode has.

I missed the possible significance of the Latvian comma below vs.
Marshallese cedilla, which captured most of the ensuing discussion and
morphed into a discussion about different user communities and group
identity.

I'd like to restate, since I think the point may have been lost, that
for the OTHER characters Robert mentioned:

> H/h-acute, capital T-dieresis, capital H-underbar, acute accented
> Cyrillic vowels, Cyrillic ER/er-caron, ...

there does not appear to be any conflicting usage between different user
communities, and no particular difficulty in rendering or otherwise
processing these as combining sequences, using up-to-date fonts and
rendering engines. I suppose Philippe's example of Võro might factor
into whether different groups prefer different appearances for h́, but
otherwise these user-perceived characters seem to be non-controversial.

So to reiterate, these characters appear vanishingly unlikely to be
atomically encoded, "yet" or ever, for good reason.

--
Doug Ewell | Thornton, CO, US | ewellic.org