Unicode "no-op" Character?
Philippe Verdy via Unicode
unicode at unicode.org
Wed Jul 3 03:55:23 CDT 2019
Also consider that C0 controls (like STX and ETX) can already be used for
packetizing, but immediately comes the need for escaping (DLE has been used
for that goal, jsut before the character to preserve in the stream content,
notably before DLE itself, or STX and ETX).
There's then no need at all of any new character in Unicode. But if your
protoclol does not allow any fom of escaping, then it is broken as it
cannot transport **all** valid Unicode text.
Le mer. 3 juil. 2019 à 10:49, Philippe Verdy <verdy_p at wanadoo.fr> a écrit :
> Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk <sosipiuk at gmail.com> a
> écrit :
>> I don’t think you understood me at all. I can packetize a string with any
>> character that is guaranteed not to appear in the text.
> Your goal is **impossible** to reach with Unicode. Assume sich character
> is "added" to the UCS, then it can appear in the text. Your goal being that
> it should be "warrantied" not to be used in any text, means that your
> "character" cannot be encoded at all. Unicode and ISO **require** that the
> any proposed character can be used in text without limitation. Logivally it
> would be rejected becauyse your character would not be usable at all from
> the start.
> So you have no choice: you must use some transport format for your
> "packeting", jsut like what is used in MIME for emails, in HTTP(S) for
> streaming, or in internationalized domain names.
> For your escaping mechanism you have a very large choice already of
> characters considered special only for your chosen transport syntax.
> Your goal shows a chicken and egg problem. It is not solvable without
> creating self-contradictions immediately (and if you attempt to add some
> restriction to avoid the contradiction, then you'll fall on cases where you
> can no longer transport your message and your protocol will become unusable.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode