Format A

Doug Ewell via Unicode unicode at unicode.org
Thu May 30 11:49:13 CDT 2019


Apologies if this is a repeat of a (much) earlier inquiry.
 
The mapping tables that are available as part of the Unicode Standard
(http://www.unicode.org/Public/MAPPINGS/) are generally provided in a
text format called "Format A." Each line in the file defines a mapping
between a character in a legacy encoding and the Unicode equivalent,
with fields separated by tabs or sequences of spaces, like this:
 
0xA0	0x00A0	#NO-BREAK SPACE
0xA1	0x00A1	#INVERTED EXCLAMATION MARK
0xA2	0x00A2	#CENT SIGN
 
The format supports DBCS as well:
 
0x8140	0x4E02	#CJK UNIFIED IDEOGRAPH
0x8141	0x4E04	#CJK UNIFIED IDEOGRAPH
0x8142	0x4E05	#CJK UNIFIED IDEOGRAPH
 
My questions are:
 
1. Is there a specification for this format anywhere, and if so, where?
 
2. Is there a "Format B" or similar? (I don't mean UCM, CharMapML, RFC
1345 format, etc., but something truly similar to and/or derivative of
Format A.)
 
Please reply on-list only if you think the list at large would benefit
from your reply. I'm hoping some of the Unicode elders might have some
insight here.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org
 



More information about the Unicode mailing list