PUA (BMP) planned characters HTML tables

James Kass via Unicode unicode at unicode.org
Wed Aug 14 04:05:02 CDT 2019

On 2019-08-12 8:30 AM, Andrew West wrote:
> This issue was discussed at WG2 in 2013
> (https://www.unicode.org/L2/L2013/13128-latvian-marshal-adhoc.pdf),
> when there was a recommendation to encode precomposed letters L and N
> with cedilla*with no decomposition*, but that solution does not seem
> to have been taken up by the UTC.

Group One dots their lowercase "i" letters with little flowers and Group 
Two dots theirs with little hearts.  Group Two considers flowers 
unacceptable and Group One rejects hearts.  Because of legacy character 
sets there's a precomposed character encoded called "LATIN LOWER CASE I 
WITH HEART", but it was misnamed and is normally drawn with a flower 
instead.  Group Two tries to encode "LATIN LOWER CASE I" plus "COMBINING 
HEART" to get the thing to display properly.  But because there's a 
decomposition involved, the font engine substitutes the glyph mapped to 
"LATIN LOWER CASE I WITH HEART" in the display for the string "LATIN 
LOWER CASE I" plus "COMBINING HEART".  This thwarts Group Two because 
they still get the flower.

The solution is to deprecate "LATIN LOWER CASE I WITH HEART".  It's only 
in there because of legacy.  It's presence guarantees round-tripping 
with legacy data but it isn't needed for modern data or display.  Urge 
Groups One and Two to encode their data with the desired combiner and 
educate font engine developers about the deprecation.  As the rendering 
engines get updated, the system substitution of the wrongly named 
precomposed glyph will go away.

This presumes that the premise of user communities feeling strongly 
about the unacceptable aspect of the variants is valid.  Since it has 
been reported and nothing seems to be happening, perhaps the casual 
users aren't terribly concerned.  It's also possible that the various 
user communities have already set up their systems to handle things 
acceptably by installing appropriate fonts.

More information about the Unicode mailing list