Use of tag characters in a private encoding - is it valid please?
William_J_G Overington
wjgo_10009 at btinternet.com
Sat Apr 27 10:59:00 CDT 2024
Erik Carvalhal Miller wrote as follows.
> The tag characters being default ignorable, I would expect a single
missing‐glyph glyph (representing the private‐use code point) to be more
likely, though your mileage may vary. Here, for example, is a sequence
containing 15 tag characters between a private‐use base and the CANCEL
TAG:
⟨⟩
Thank you for posting the example.
In the Unicode mailing list archive and in the webmail that I use the
display was of the mathematical brackets with one glyph that indicates a
missing glyph between them.
When I copied the example onto the clipboard and then pasted into
WordPad, the display was of the mathematical brackets with seventeen of
the glyphs that each indicate a missing glyph between the mathematical
brackets.
I saved from WordPad using the Save as feature to save in the file
format that WordPad names as a Unicode Text Document, which is a UTF-16
format file with a BYTE ORDER MARK that indicates, in this particular
file, that the low byte is stored before the high byte. I used tags.txt
as the file name.
I then opened the tags.txt file in the ViewHex.exe program that Erwin
Denissen had kindly posted in 2009.
https://forum.high-logic.com/viewtopic.php?p=10579#p10579
<https://forum.high-logic.com/viewtopic.php?p=10579#p10579>
From there I can note that the fifteen tag character message is as
follows.
This is a test.
I used the Edit Search... facility of the FontCreator program to find
that the base character used is as follows.
U+10FFFD
A good choice, at the top of the map, so maybe it can be thought of as
NORTH STAR
There was a famous early steam locomotive named NORTH STAR so one may,
if one so chooses, think of the NORTH STAR locomotive hauling a train of
tag characters with the CANCEL TAG as a brake van at the end of the
train.
https://en.wikipedia.org/wiki/GWR_Star_Class
An OpenType font used in an OpenType-aware application program can be
used to decode the base character and the sequence of tag characters.
I tried the technique when the QID emoji proposal was being considered
and the technique worked well.
https://forum.high-logic.com/viewtopic.php?p=39337
<https://forum.high-logic.com/viewtopic.php?p=39337>
The technique can be compared and contrasted with the use of a direct
Private Use Area encoding for a character that one has designed.
A direct Private Use Area encoding is easier to use, a tag sequence
encoding provides scope for a greater chance of a unique encoding
assisting unambiguous interoperability and archiving.
William Overington
Saturday 27 April 2024
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240427/386d5cc5/attachment.htm>
More information about the Unicode
mailing list