Why missing characters and empty code points?

Doug Ewell doug at ewellic.org
Thu May 25 11:20:15 CDT 2023


“admin” wrote:

> For example, why is MATHEMATICAL SCRIPT SMALL O missing and the
> assumed code point it would have, 1D4C4, is empty?  Yet 1D4C3
> (MATHEMATICAL SCRIPT SMALL N) and 1D4C5 (MATHEMATICAL SCRIPT SMALL P)
> are defined. Makes no sense.  Thanks

It makes sense if you take a look at the nameslist file, or the text immediately adjacent to the code charts, or any number of other sources. There you will see that U+2134 SCRIPT SMALL O exists, which is why a duplicate of this character was not encoded at 0x1D4C4.

| 1D4C4   <reserved>
|         x (script small o - 2134)

Duplicate characters are generally not encoded simply to fill holes in the coding space. The question of “what is a duplicate” becomes complex, especially for those new to the Unicode/10646 character identification process, partly because some duplicates or near-duplicates do exist for legacy compatibility purposes, and partly because lookalike characters in different scripts (such as Latin A and Greek Α and Cyrillic А) are correctly not unified.

You began your post with “For example.” Please check the sources mentioned (or ask if you don’t know how to find them) for the other characters you feel are missing, and then check back in if you have additional questions.

Thanks,

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org



More information about the Unicode mailing list