Unicode Properties and Canonical Equivalence
asmusf at ix.netcom.com
Wed Aug 17 06:49:45 CDT 2022
A process /*may */treat two canonically equivalent sequences
differently. For example when determining how to allocate buffers, any
length difference matters and may, at some point, surface to the user,
if not intentionally.
This case seems somewhat equivalent.
What the conformance clause intends is that processes (and protocols for
that matter) don't intentionally rely on the differences in encoding.
(However, for example, a protocol may require a particular normalization
form, while rejecting unnormalized data).
[If people feel that this is forbidden by the current conformance
clause, we would have serious troubles with protocols like IDNA2008
which enforce Normalization Form NFC for representation of data at
A minor infidelity in script run parsing doesn't appear to rise to the
level of concern that was the focus of the conformance clause about
treating different normalizations differently.
That said, it's strongly preferable to design properties with closure
under normalization, but edge cases like this need to be handled with
some understanding of what the costs and benefits are of trying to
implement such a guarantee.
On 8/16/2022 1:10 AM, Richard Wordingham via Unicode wrote:
> On Mon, 15 Aug 2022 11:38:24 -0700
> Markus Scherer via Unicode<unicode at corp.unicode.org> wrote:
>> ... and which
>> value you think we should change to what other value.
> I wasn't suggesting that values may be changed, though my question may
> constitute evidence that some values should be changed. My question
> was as to how we should handle the anomalies while complying with
> conformance requirement C6 in TUS Section 3.2. Perhaps some
> Unicode properties are simply inconsistent with that requirement. If
> anything should be changed, perhaps it is the guidance on regular
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode