Normalization Generics (NFx, NFKx, NFxy)

Zach Lym indolering at gmail.com
Tue Dec 15 14:04:00 CST 2020


Okay, so points for pedantry ... but do you have any input on adding
normalization generics to Unicode pseudocode?

Or would you like to split this discussion out into a new topic?

On Tue, Dec 15, 2020 at 11:21 AM Richard Wordingham via Unicode
<unicode at unicode.org> wrote:
>
> On Sun, 13 Dec 2020 20:08:08 -0800
> Zach Lym via Unicode <unicode at unicode.org> wrote:
>
> > > What does that quoted statement mean?  I'm having a hard job working
> > > out what the meaning of full case folding is.  I'm not having any
> > > doubts about the meaning of toCasefold(NFD(X)), so there is no issue
> > > for 'canonical caseless matching'.
> >
> > The "case folding is closed under canonical normalization" or the
> > other part?
>
> That part.
>
> > Closed as in closure:
> > https://en.wikipedia.org/wiki/Closure_(mathematics)
>
> That only tells me what it means for a _set_ to be closed under an
> operation.  What does it mean for a _function_ (or similar) to be
> closed under an operation?
>
> If I must use the definition for a set, then I can only conclude that
> for one operation to be closed under another operation, the result
> should be independent of the order in which they are applied.
>
> But for X = <U+1FB3 GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI, U+0342
> COMBINING GREEK PERISPOMENI>:
>
> NFD(toCasefold(X)) = <U+03B1 GREEK SMALL LETTER ALPHA, U+03B9 GREEK
> SMALL LETTER IOTA, U+0342>
>
> toCasefold(NFD(X)) = <U+03B1, U+0342, U+03B9>
>
> NFC(toCasefold(X)) = <U+03B1, U+1FD6 GREEK SMALL LETTER IOTA WITH
> PERISPOMENI>
>
> toCasefold(NFC(X)) = <U+1FB6 GREEK SMALL LETTER ALPHA WITH PERISPOMENI,
> U+03B9>
>
> So either "case folding is closed under canonical normalization" means
> something else, or it is simply not true.
>
> > Refer to page 240 of the standard, Chaper 5 "Implementation
> > Guidelines" Section 18 "Case Mappings":
> >
> > http://www.unicode.org/versions/latest/ch05.pdf
>
> Why?
>
> The trick is not to be deflecting by the opening paragraph in TUS
> Section 3.13, but to read on to find R4.
>
> Richard.


More information about the Unicode mailing list