Reserved character issue

Ken Whistler kenwhistler at sonic.net
Sun Oct 8 09:46:25 CDT 2023


Julian has provided the explanation. The code charts are produced by 
tooling that has logic for suppressing the display of overly long ranges 
of reserved code points in the code charts that would serve no point for 
display.

When trying to get accurate character counts of any particular type, one 
should always depend on the data files in the UCD directly, rather than 
attempting to deconstruct values from the code charts.

For counts by General_Category values, including gc=Cn, see

https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedGeneralCategory.txt

Note that gc=Cn also is not quite the same as "reserved", because that 
particular gc value combines both reserved code points and noncharacter 
code points.

For most public purposes other than detailed implementations, there are 
also somewhat simplified, but handy character count statistics available 
for every version of the Unicode Standard:

https://www.unicode.org/versions/stats/

Those statistics can be used to answer the general questions such as: 
"How many characters are in Unicode?"

--Ken

On 10/7/2023 12:58 PM, Julian Bradfield via Unicode wrote:
> >From TUS chapter 24, page 951: Reserved Characters. Character codes 
> that are marked “<reserved>” are unassigned and reserved for future 
> encoding. Reserved codes are indicated by a  glyph. To ensure read- 
> ability, many instances of reserved characters have been suppressed 
> from the names list. 


More information about the Unicode mailing list