PUA (BMP) planned characters HTML tables

Andrew West via Unicode unicode at unicode.org
Mon Aug 12 03:30:35 CDT 2019

On Mon, 12 Aug 2019 at 02:27, James Kass via Unicode
<unicode at unicode.org> wrote:
> On 2019-08-11 5:26 PM, [ Doug Ewell ] via Unicode wrote:
> > If you are thinking of these as potential future additions to the standard, keep in mind that accented letters that can already be represented by a combination of letter + accent will not ever be encoded. This is one of the longest-standing principles Unicode has.

People seem to be ignoring the fact that Marshallese and Latvian both
use L and N with cedilla, but with completely different glyph shapes:

> In January 2013, the Unicode Technical Committee discussed issues for the representation of
> Marshallese orthography. In particular, Marshallese uses the Latin script and requires the letters l,
> m, n, and o with cedilla. Latvian orthography uses the Latin script and requires the letters g, k, l, n,
> and r with comma below. For Marshallese, it is unacceptable to display cedillas as commas below.
> Conversely, for Latvian, it is unacceptable to display commas below as cedillas.

However, as fonts have been following Latvian practice for these
letters (cedilla is displayed as a comma below) since before Unicode,
Marshallese users cannot get their desired outcome using standard
Unicode combining diacritical marks unless they apply a font specially
designed for Marshallese -- which you can never guarantee if you are
writing an email or posting on twitter, etc.

This issue was discussed at WG2 in 2013
when there was a recommendation to encode precomposed letters L and N
with cedilla *with no decomposition*, but that solution does not seem
to have been taken up by the UTC.


More information about the Unicode mailing list