Unicode "no-op" Character?

Shawn Steele via Unicode unicode at unicode.org
Mon Jun 24 00:39:15 CDT 2019


But... it's not actually discardable.  The hypothetical "packet" architecture (using the term architecture somewhat loosely) needed the information being tunneled in by this character.  If it was actually discardable, then the "noop" character wouldn't be required as it would be discarded.

Since the character conveys meaning to some parts of the system, then it's not actually a "noop" and it's not actually "discardable".  

What is actually being requested isn't a character that nobody has meaning for, but rather a character that has no PUBLIC meaning.  

Which leads us to the key.  The desire is for a character that has no public meaning, but has some sort of private meaning.  In other words it has a private use.  Oddly enough, there is a group of characters intended for private use, in the PUA ;-)

Of course if the PUA characters interfered with the processing of the string, they'd need to be stripped, but you're sort of already in that position by having a private flag in the middle of a string.

-Shawn  

-----Original Message-----
From: Unicode <unicode-bounces at unicode.org> On Behalf Of Slawomir Osipiuk via Unicode
Sent: Saturday, June 22, 2019 6:10 PM
To: unicode at unicode.org
Cc: 'Richard Wordingham' <richard.wordingham at ntlworld.com>
Subject: RE: Unicode "no-op" Character?

That's the key to the no-op idea. The no-op character could not ever be assumed to survive interchange with another process. It'd be canonically equivalent to the absence of character. It could be added or removed at any position by a Unicode-conformant process. A program could wipe all the no-ops from a string it has received, and insert its own for its own purposes. (In fact, it should wipe the old ones so as not to confuse
itself.) It's "another process's discardable junk" unless known, internally-only, to be meaningful at a particular stage.

While all the various (non)joiners/ignorables are interesting, none of them have this property.

In fact, that might be the best description: It's not just an "ignorable", it's a "discardable". Unicode doesn't have that, does it?

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard Wordingham via Unicode
Sent: Saturday, June 22, 2019 20:59
To: unicode at unicode.org
Cc: Shawn Steele
Subject: Re: Unicode "no-op" Character?

If they're conveying an invisible message, one would have to strip out original ZWNBSP/WJ/ZWSP that didn't affect line-breaking.  The weak point is that that assumes that line-break opportunities are well-defined.  For example, they aren't for SE Asian text.

Richard.




More information about the Unicode mailing list