Unicode Properties and Canonical Equivalence

Asmus Freytag asmusf at ix.netcom.com
Wed Aug 17 06:49:45 CDT 2022


A process /*may */treat two canonically equivalent sequences 
differently. For example when determining how to allocate buffers, any 
length difference matters and may, at some point, surface to the user, 
if not intentionally.

This case seems somewhat equivalent.

What the conformance clause intends is that processes (and protocols for 
that matter) don't intentionally rely on the differences in encoding. 
(However, for example, a protocol may require a particular normalization 
form, while rejecting unnormalized data).

[If people feel that this is forbidden by the current conformance 
clause, we would have serious troubles with protocols like IDNA2008 
which enforce Normalization Form NFC for representation of data at 
certain interfaces.]

A minor infidelity in script run parsing doesn't appear to rise to the 
level of concern that was the focus of the conformance clause about  
treating different normalizations differently.

That said, it's strongly preferable to design properties with closure 
under normalization, but edge cases like this need to be handled with 
some understanding of what the costs and benefits are of trying to 
implement such a guarantee.

A./




On 8/16/2022 1:10 AM, Richard Wordingham via Unicode wrote:
> On Mon, 15 Aug 2022 11:38:24 -0700
> Markus Scherer via Unicode<unicode at corp.unicode.org>  wrote:
>
>> ... and which
>> value you think we should change to what other value.
> I wasn't suggesting that values may be changed, though my question may
> constitute evidence that some values should be changed.  My question
> was as to how we should handle the anomalies while complying with
> conformance requirement C6 in TUS Section 3.2.  Perhaps some
> Unicode properties are simply inconsistent with that requirement. If
> anything should be changed, perhaps it is the guidance on regular
> expressions.
>
> Richard.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220817/957e3ed0/attachment.htm>


More information about the Unicode mailing list