Unicode "no-op" Character?

Sławomir Osipiuk via Unicode unicode at unicode.org
Mon Jun 24 11:47:58 CDT 2019

It's discardable outside of the context/process that created it.
For a receiving process there is a difference between "this character has a
meaning you don't understand" and "this character had a transitory meaning
that has been exhausted".
The first implies that it needs to be preserved and survive round-trip
transmission (in fact the Unicode standard requires that). The second
implies that it can be discarded.
The first implies that it should be displayed to the user even if only as an
"unknown something here". The second implies it should be ignored completely
in display.

Noncharacters have a use as internal-only sentinels, but they are difficult
for an intermediate process to use if the text it receives already contains
them (http://www.unicode.org/faq/private_use.html#nonchar10) and they break
up combinations (they have a display effect, even if it's a subtle one).

Private Use Characters are nice but they are still "part of" the text; if
they are removed, the text is semantically changed. And they too display as
something. I have to go back to how the SYN control character is defined.
ECMA16/ISO1745 says "SYN is generally removed at the receiving Terminal
Installation." It has a transitory purpose that is exhausted as soon as it
is received. I wish Unicode hadn't shied away from either formalizing SYN or
providing some kind of equivalent. I know it wasn't part of the scope
Unicode set for itself, but I can still dream.

-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele at microsoft.com] 
Sent: Monday, June 24, 2019 01:39
To: Sławomir Osipiuk; unicode at unicode.org
Cc: 'Richard Wordingham'
Subject: RE: Unicode "no-op" Character?

But... it's not actually discardable.  The hypothetical "packet"
architecture (using the term architecture somewhat loosely) needed the
information being tunneled in by this character.  If it was actually
discardable, then the "noop" character wouldn't be required as it would be

Since the character conveys meaning to some parts of the system, then it's
not actually a "noop" and it's not actually "discardable".  

What is actually being requested isn't a character that nobody has meaning
for, but rather a character that has no PUBLIC meaning.  

Which leads us to the key.  The desire is for a character that has no public
meaning, but has some sort of private meaning.  In other words it has a
private use.  Oddly enough, there is a group of characters intended for
private use, in the PUA ;-)

Of course if the PUA characters interfered with the processing of the
string, they'd need to be stripped, but you're sort of already in that
position by having a private flag in the middle of a string.


More information about the Unicode mailing list