Format A
Doug Ewell via Unicode
unicode at unicode.org
Thu May 30 11:49:13 CDT 2019
Apologies if this is a repeat of a (much) earlier inquiry.
The mapping tables that are available as part of the Unicode Standard
(http://www.unicode.org/Public/MAPPINGS/) are generally provided in a
text format called "Format A." Each line in the file defines a mapping
between a character in a legacy encoding and the Unicode equivalent,
with fields separated by tabs or sequences of spaces, like this:
0xA0 0x00A0 #NO-BREAK SPACE
0xA1 0x00A1 #INVERTED EXCLAMATION MARK
0xA2 0x00A2 #CENT SIGN
The format supports DBCS as well:
0x8140 0x4E02 #CJK UNIFIED IDEOGRAPH
0x8141 0x4E04 #CJK UNIFIED IDEOGRAPH
0x8142 0x4E05 #CJK UNIFIED IDEOGRAPH
My questions are:
1. Is there a specification for this format anywhere, and if so, where?
2. Is there a "Format B" or similar? (I don't mean UCM, CharMapML, RFC
1345 format, etc., but something truly similar to and/or derivative of
Format A.)
Please reply on-list only if you think the list at large would benefit
from your reply. I'm hoping some of the Unicode elders might have some
insight here.
--
Doug Ewell | Thornton, CO, US | ewellic.org
More information about the Unicode
mailing list