Unicode "no-op" Character?

Sławomir Osipiuk via Unicode unicode at unicode.org
Sat Jun 22 19:26:09 CDT 2019


I assure you, it wasn't very interesting. :-) Headache-y, more like. The
diacritic thing was completely inapplicable anyway, as all our text was
plain English. I really don't want to get into what the thing was, because
it sounds stupider the more I try to explain it. But it got the wheels
spinning in my head, and now that I've been reading up a lot about Unicode
and older standards like 2022/6429, it got me thinking whether there might
already be an elegant solution.

 

But, as an example I'm making up right now, imagine you want to packetize a
large string. The packets are not all equal sized, the sizes are determined
by some algorithm. And the packet boundary may occur between a base char and
a diacritic. You insert markers into the string at the packet boundaries.
You can then store the string, copy it, display it, or pass it to the
sending function which will scan the string and know to send the next packet
when it reaches the marker. And you can now do all that without the need to
pass around extra metadata (like a list of ints of where the packet
boundaries are supposed to be) or to re-calculate the boundaries; it's still
just a big string. If a different application sees the string, it will know
to completely ignore the packet markers; it can even strip them out if it
wants to (the canonical equivalent of the noop character is the absence of a
character).

 

As should be obvious, I'm not recommending this as good practice.

 

 

From: Shawn Steele [mailto:Shawn.Steele at microsoft.com] 
Sent: Saturday, June 22, 2019 19:57
To: Sławomir Osipiuk; unicode at unicode.org
Subject: RE: Unicode "no-op" Character?

 

+ the list.  For some reason the list's reply header is confusing.

 

From: Shawn Steele 
Sent: Saturday, June 22, 2019 4:55 PM
To: Sławomir Osipiuk <sosipiuk at gmail.com>
Subject: RE: Unicode "no-op" Character?

 

The original comment about putting it between the base character and the
combining diacritic seems peculiar.  I'm having a hard time visualizing how
that kind of markup could be interesting?

 

From: Unicode <unicode-bounces at unicode.org> On Behalf Of Slawomir Osipiuk
via Unicode
Sent: Saturday, June 22, 2019 2:02 PM
To: unicode at unicode.org
Subject: RE: Unicode "no-op" Character?

 

I see there is no such character, which I pretty much expected after Google
didn't help.

 

The original problem I had was solved long ago but the recent article about
watermarking reminded me of it, and my question was mostly out of curiosity.
The task wasn't, strictly speaking, about "padding", but about marking -
injecting "flag" characters at arbitrary points in a string without
affecting the resulting visible text. I think we ended up using ESC, which
is a dumb choice in retrospect, though the whole approach was a bit of a
hack anyway and the process it was for isn't being used anymore.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190622/5c5d071b/attachment.html>


More information about the Unicode mailing list