Use of tag characters in a private encoding - is it valid please?

William_J_G Overington wjgo_10009 at btinternet.com
Sat Apr 27 10:59:00 CDT 2024


Erik Carvalhal Miller wrote as follows.
 
> The tag characters being default ignorable, I would expect a single
missing‐glyph glyph (representing the private‐use code point) to be more
likely, though your mileage may vary.   Here, for example, is a sequence
containing 15 tag characters between a private‐use base and the CANCEL 
TAG:
⟨􏿽󠁔󠁨󠁩󠁳󠀠󠁩󠁳󠀠󠁡󠀠󠁴󠁥󠁳󠁴󠀮󠁿⟩
 
Thank you for posting the example.
 
In the Unicode mailing list archive and in the webmail that I use the 
display was of the mathematical brackets with one glyph that indicates a 
missing glyph between them.
When I copied the example onto the clipboard and then pasted into 
WordPad, the display was of the mathematical brackets with seventeen of 
the glyphs that each indicate a missing glyph between the mathematical 
brackets.
 
I saved from WordPad using the Save as feature to save in the file 
format that WordPad names as a Unicode Text Document, which is a UTF-16 
format file with a BYTE ORDER MARK that indicates, in this particular 
file, that the low byte is stored before the high byte. I used tags.txt 
as the file name.
 
I then opened the tags.txt file in the ViewHex.exe program that Erwin 
Denissen had kindly posted in 2009.
 
https://forum.high-logic.com/viewtopic.php?p=10579#p10579 
<https://forum.high-logic.com/viewtopic.php?p=10579#p10579>
 
 From there I can note that the fifteen tag character message is as 
follows.
 
This is a test.
 
I used the Edit Search... facility of the FontCreator program to find 
that the base character used is as follows.
 
U+10FFFD
 
A good choice, at the top of the map, so maybe it can be thought of as 
NORTH STAR
 
There was a famous early steam locomotive named NORTH STAR so one may, 
if one so chooses, think of the NORTH STAR locomotive hauling a train of 
tag characters with the CANCEL TAG as a brake van at the end of the 
train.
 
https://en.wikipedia.org/wiki/GWR_Star_Class
 
An OpenType font used in an OpenType-aware application program can be 
used to decode the base character and the sequence of tag characters.
 
I tried the technique when the QID emoji proposal was being considered 
and the technique worked well.
 
https://forum.high-logic.com/viewtopic.php?p=39337 
<https://forum.high-logic.com/viewtopic.php?p=39337>
 
The technique can be compared and contrasted with the use of a direct 
Private Use Area encoding for a character that one has designed.
 
A direct Private Use Area encoding is easier to use, a tag sequence 
encoding provides scope for a greater chance of a unique encoding 
assisting unambiguous interoperability and archiving.
    
William Overington
 
Saturday 27 April 2024
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240427/386d5cc5/attachment.htm>


More information about the Unicode mailing list