Pure Regular Expression Engines and Literal Clusters
Richard Wordingham via Unicode
unicode at unicode.org
Fri Oct 11 13:18:46 CDT 2019
On Fri, 11 Oct 2019 12:39:56 +0200
Elizabeth Mattijsen via Unicode <unicode at unicode.org> wrote:
> Furthermore, Perl 6 uses Normalization Form Grapheme for matching:
I seriously doubt that a Thai considers each combination of consonant
(44), non-spacing vowel (7) and tone mark (4) a different character.
Moreover, if what you say is correct, perl6 will be useless for
finding such combinations in correctly spelled text. The regular
would find only misspellings because in correct Thai spelling, matching
sequences constitute grapheme clusters. I trust perl6 will actually
continue to support analyses of strings as sequences of codepoints.
More information about the Unicode