What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

via Unicode unicode at unicode.org
Sat Feb 15 04:11:48 CST 2020

Hi William,

I don't fully understand your proposed encoding scheme (e.g., Is there a 
namespace each encoding scheme is bound to? How do namespaces get 
encoded? How are syntax strictures encoded?), but even then, presuming 
it's sound, you've said in the message before that this encoding space 
will enhance interoperability. What mechanism is in place to make my 
encoding space interoperable with yours? Perhaps, independent of each 
other, you bind !123 to a character semantically identical to one I've 
bound to !234. What rules are in place to allow interchangeability? What 
about one-to-many or many-to-many or vague or ambiguous mappings across 
encoding schemes, or mappings that we might reasonably contest?

Or maybe you're not so much concerned about interoperability as are you 
are with extending the PUA block beyond its current limits? Something 
like SGML/XML entities? Couldn't you simply capitalize on the rules that 
already exist for entities?

Best wishes,

Joel Kalvesmaki
Director, Text Alignment Network

On 2020-02-14 15:52, wjgo_10009 at btinternet.com via Unicode wrote:
>>> The solution is to invent my own encoding space. This sits on top of 
>>> Unicode, could be (perhaps?) called markup, but it works!
>> It may be perilous, because some software may enforce the strict 
>> official code point limits.
> I  have now realized that what I wrote before is ambiguous.
> When I wrote "sits on top of Unicode" I was not meaning at some code
> points above U+10FFFF in the Unicode map, though I accept that it
> could quite reasonably be read as meaning that.
> My encoding space sits on top of Unicode in the sense that it uses a
> sequence of regular Unicode characters for each code point in my
> encoding space.
> For example
> ∫⑦⑧①
> or
> !781
> or
> a character sequence of a base character, followed by a tag
> exclamation mark followed by three tag digits and a cancel tag.
> All three examples above have the same meaning.
> ∫⑦⑧① is useful as more unlikely otherwise than !123, though !123 is
> easier to use and could be used in a GS1-128 barcode.
> The tag sequence has the potential to become incorporated into Unicode
> for universal standardization of unambiguous interoperability
> everywhere. That is a long term goal for me.
> The example above uses a three-digit code number. My encoding space
> allows for various numbers of digits, with a minimum of three digits
> and a much larger theoretical maximum. The most digits in use at
> present in my research project in any one code number is six.
> William Overington
> Friday 14 February 2020

More information about the Unicode mailing list