Pure Regular Expression Engines and Literal Clusters

Richard Wordingham via Unicode unicode at unicode.org
Mon Oct 14 02:18:54 CDT 2019


On Sun, 13 Oct 2019 20:25:25 -0700
Asmus Freytag via Unicode <unicode at unicode.org> wrote:

> On 10/13/2019 6:38 PM, Richard Wordingham via Unicode wrote:
> On Sun, 13 Oct 2019 17:13:28 -0700

>> Yes.  There is no precomposed LATIN LETTER M WITH CIRCUMFLEX, so
>> [:Lu:] should not match <U+004D LATIN CAPITAL LETTER M, U+0302
>> COMBINING CIRCUMFLEX ACCENT>. 

> Why does it matter if it is precomposed? Why should it? (For anyone
> other than a character coding maven).

Because general_category is a property of characters, not strings.  It
matters to anyone who intends to conform to a standard.

>> Now, I could invent a string
>> property so that \p{xLu} that meant (:?\p{Lu}\p{Mn}*).

No, I shouldn't!  \m{xLu} is infinite, which would not be allowed for
a Unicode set.  I'd have to resort to a wordy definition for it to be a
property.

Richard.


More information about the Unicode mailing list