Deleting Lone Surrogates
Richard Wordingham
richard.wordingham at ntlworld.com
Sun Oct 4 18:14:13 CDT 2015
On Sun, 4 Oct 2015 14:29:16 -0700
"Asmus Freytag (t)" <asmus-inc at ix.netcom.com> wrote:
> On 10/4/2015 12:38 PM, Richard Wordingham wrote:
> The problem you are trying to solve is to allow editing on
> the code point level, or, if you will, the keystroke level.
> Generally, there will be a sweet spot for each language (and each
> user) with respect to what to erase or undo.
> For sequences that belong to a given language, you can pick the
> behavior that makes most sense in them, but for lone surrogates, by
> definition you are dealing with broken text that doesn't follow any
> conventions.
Who's 'you'? Customisation is frequently not available. In fact, I
don't recall seeing it on offer.
> It should also be something that doesn't occur commonly. So, for all
> of those reasons, I see no particular problem with giving that a
> "generic" behavior, which could be that of deleting the entire
> combining sequence; especially if your interface normally deletes
> sequences as a unit.
> But in any case, the minimal requirement on an editor is that it lets
> you delete (and then retype) enough text to get it back to an
> uncorrupted state.
In the problem I hit, I would nearly be left with two options - never
having CANDRABINDU and always having it preceded by CANDRABINDU.
Whenever I enter CANDRABINDU, it is preceded by the lone surrogate.
Consequently, the option of retyping the sequence is of no avail.
Fortunately, in the application where I met the problem, the lone
surrogates, and nothing else, get deleted when the file is saved. The
problem could very easily be a lot worse.
----
> Catch-22 here. In filtering input to the dialog to prevent it from
> being used to corrupt text, you prevent it from being used to repair
> text. Interesting.
Not very different to having a very roll-stable aeroplane. If you ever
do end up upside-down, you have a big problem.
Richard.
More information about the Unicode
mailing list