What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)
firstname.lastname@example.org via Unicode
unicode at unicode.org
Sat Feb 15 14:46:54 CST 2020
Joel Kalvesmaki asks nine questions, six in the first block and three in
the second block.
Numbering from 1 through to 9 in the order that they are asked, I do
not, at present understand the question for many of them and I can, at
present, only answer question 7 definitively. Some questions may need an
answer in two parts, one of the parts about my specific project, and the
other part about if one or more people also decide to have his or her
own encoding space in a similar manner.
I realize that not even understanding the question at this time may not
sound very good to just some of the people who do understand the
question, but I am not someone who knowingly purports that he knows what
he is talking about when he does not. I am a researcher and as I am now
on awareness of these questions.I need to find out so that in the future
I can answer such questions with a sound background knowledge of the
It might be that I know of some matters but that I am not aware of the
parlance used to describe them in the post to which I am replying..
So now to my thoughts on some of the questions.
1 to 4. I do not at present understand the question.
5. Perhaps, independent of each other, you bind !123 to a character
semantically identical to one I've bound to !234. What rules are in
place to allow interchangeability?
I am not sure this is the best possible answer, but with care the
problem should not happen in the first place. I am thinking that people
could perhaps avoid it happening in the first place by using an informal
discussion method similar that used when proposing a new alt. group in
the usenet system that was in widespread use before the web was
6. I do not at present understand the question.
7. Or maybe you're not so much concerned about interoperability as are
you are with extending the PUA block beyond its current limits?
No, absolutely not. I have used the Private Use Areas on a number of
occasions and found them extremely useful to have available. Yet any
assignment in not unique and, except in very limited special limited
prearranged circumstances, interoperability is not possible. My research
project is very much about interoperability with provenance.
Interoperabilty with provenance is central to what I am trying fo
8. Something like SGML/XML entities?
Until mention in the post to which I am replying, I had never known of
9. Couldn't you simply capitalize on the rules that already exist for
From what I have read about them today, well, I suppose that I could,
but that is not my approach and I am not going to use them.
My items are not emoji, but emoji are either expressed by an atomic
character or by a sequence of atomic characters, such sequences decoded
upon reception to produce a glyph. My proposed system uses sequences of
atomic character such that such sequences could be decoded upon
reception to produce localized output. A similar yet different process.
I simply do not want, as a design choice, all that angled bracket stuff,
it is just not what I am trying to do.
If anyone on this mailing list who understands some or all of what I do
not, your comments in this thread would be very welcome please.
The first three links on my webspace are relevant to my research
The website is safe to use. It is hosted on a server run these days by
Plusnet PLC, a United Kingdom internet service provider. It is not
hosted on my computer.
Saturday 15 February 2020
------ Original Message ------
From: "via Unicode" <unicode at unicode.org>
To: wjgo_10009 at btinternet.com
Cc: unicode at unicode.org
Sent: Saturday, 2020 Feb 15 At 10:11
Subject: Re: What should or should not be encoded in Unicode? (from Re:
Egyptian Hieroglyph Man with a Laptop)
I don't fully understand your proposed encoding scheme (e.g., Is there a
namespace each encoding scheme is bound to? How do namespaces get
encoded? How are syntax strictures encoded?), but even then, presuming
it's sound, you've said in the message before that this encoding space
will enhance interoperability. What mechanism is in place to make my
encoding space interoperable with yours? Perhaps, independent of each
other, you bind !123 to a character semantically identical to one I've
bound to !234. What rules are in place to allow interchangeability? What
about one-to-many or many-to-many or vague or ambiguous mappings across
encoding schemes, or mappings that we might reasonably contest?
Or maybe you're not so much concerned about interoperability as are you
are with extending the PUA block beyond its current limits? Something
like SGML/XML entities? Couldn't you simply capitalize on the rules that
already exist for entities?
Director, Text Alignment Network
On 2020-02-14 15:52, wjgo_10009 at btinternet.com via Unicode wrote:
The solution is to invent my own encoding space. This sits on top of
Unicode, could be (perhaps?) called markup, but it works!
It may be perilous, because some software may enforce the strict
official code point limits.
I have now realized that what I wrote before is ambiguous.
When I wrote "sits on top of Unicode" I was not meaning at some code
points above U+10FFFF in the Unicode map, though I accept that it
could quite reasonably be read as meaning that.
My encoding space sits on top of Unicode in the sense that it uses a
sequence of regular Unicode characters for each code point in my
a character sequence of a base character, followed by a tag
exclamation mark followed by three tag digits and a cancel tag.
All three examples above have the same meaning.
∫⑦⑧① is useful as more unlikely otherwise than !123, though !123 is
easier to use and could be used in a GS1-128 barcode.
The tag sequence has the potential to become incorporated into Unicode
for universal standardization of unambiguous interoperability
everywhere. That is a long term goal for me.
The example above uses a three-digit code number. My encoding space
allows for various numbers of digits, with a minimum of three digits
and a much larger theoretical maximum. The most digits in use at
present in my research project in any one code number is six.
Friday 14 February 2020
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode