Reserved character issue
Ken Whistler
kenwhistler at sonic.net
Sun Oct 8 09:46:25 CDT 2023
Julian has provided the explanation. The code charts are produced by
tooling that has logic for suppressing the display of overly long ranges
of reserved code points in the code charts that would serve no point for
display.
When trying to get accurate character counts of any particular type, one
should always depend on the data files in the UCD directly, rather than
attempting to deconstruct values from the code charts.
For counts by General_Category values, including gc=Cn, see
https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedGeneralCategory.txt
Note that gc=Cn also is not quite the same as "reserved", because that
particular gc value combines both reserved code points and noncharacter
code points.
For most public purposes other than detailed implementations, there are
also somewhat simplified, but handy character count statistics available
for every version of the Unicode Standard:
https://www.unicode.org/versions/stats/
Those statistics can be used to answer the general questions such as:
"How many characters are in Unicode?"
--Ken
On 10/7/2023 12:58 PM, Julian Bradfield via Unicode wrote:
> >From TUS chapter 24, page 951: Reserved Characters. Character codes
> that are marked “<reserved>” are unassigned and reserved for future
> encoding. Reserved codes are indicated by a glyph. To ensure read-
> ability, many instances of reserved characters have been suppressed
> from the names list.
More information about the Unicode
mailing list