Pure Regular Expression Engines and Literal Clusters
Richard Wordingham via Unicode
unicode at unicode.org
Mon Oct 14 02:18:54 CDT 2019
On Sun, 13 Oct 2019 20:25:25 -0700
Asmus Freytag via Unicode <unicode at unicode.org> wrote:
> On 10/13/2019 6:38 PM, Richard Wordingham via Unicode wrote:
> On Sun, 13 Oct 2019 17:13:28 -0700
>> Yes. There is no precomposed LATIN LETTER M WITH CIRCUMFLEX, so
>> [:Lu:] should not match <U+004D LATIN CAPITAL LETTER M, U+0302
>> COMBINING CIRCUMFLEX ACCENT>.
> Why does it matter if it is precomposed? Why should it? (For anyone
> other than a character coding maven).
Because general_category is a property of characters, not strings. It
matters to anyone who intends to conform to a standard.
>> Now, I could invent a string
>> property so that \p{xLu} that meant (:?\p{Lu}\p{Mn}*).
No, I shouldn't! \m{xLu} is infinite, which would not be allowed for
a Unicode set. I'd have to resort to a wordy definition for it to be a
property.
Richard.
More information about the Unicode
mailing list