Internationalised Computer Science Exercises

Richard Wordingham via Unicode unicode at unicode.org
Sat Jan 27 22:12:30 CST 2018


On Sat, 27 Jan 2018 14:13:40 -0800
Shervin Afshar <shervinafshar at gmail.com> wrote:

> On Mon, Jan 22, 2018 at 2:08 PM, Richard Wordingham via Unicode <
> unicode at unicode.org> wrote:  

> > On Mon, 22 Jan 2018 at 16:39:57, Andre Schappo via Unicode <  
> > unicode at unicode.org> wrote:  

> > > By way of example, one programming challenge I set to students a
> > > couple of weeks ago involves diacritics. Please see
> > > jsfiddle.net/coas/wda45gLp<https://jsfiddle.net/coas/wda45gLp/>  

> > Did any of them come up with the idea of using traces instead of
> > strings?

> Care to elaborate? Are you referring to sequence alignment methods?

No, I'm thinking of the trace monoid (see e.g.
https://en.wikipedia.org/wiki/Trace_monoid).  One way of thinking of
strings is as concatenations of the NFD decompositions of their
constituent characters. Then the canonical equivalence classes of these
strings form the trace monoid of indecomposable characters.  The theory
of regular expressions (though you may not think that mathematical
regular expressions matter) extends to trace monoids, with the
disturbing exception that the Kleene star of a regular language is not
necessarily regular.  (The prototypical example is sequences (xy)^n
where x and y are distinct and commute, i.e. xy and yx are canonically
equivalent in Unicode terms.  A Unicode example is the set of strings
composed only of U+0F73 TIBETAN VOWEL SIGN II - there is no FSM that
will recognise canonically equivalent strings).

One consequence of this view is that one has to think of U+1EAD LATIN
SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW (ậ) beinɡ both composed of
the Vietnamese vowel letter U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
(â) and tone mark  U+0323 COMBINING DOT BELOW and also composed of, in
the spirit of Thai ISO 11940 transliteration, of the transliterated Thai
vowel U+1EA1 LATIN SMALL LETTER A WITH DOT BELOW (ạ), corresponding to
U+0E31 THAI CHARACTER MAI HAN-AKAT, and the tone mark U+0302 COMBINING
CIRCUMFLEX ACCENT, corresponding to U+0E49 THAI CHARACTER MAI THO.  (In
ISO 11940 as specified, the tone mark is actually written on the
immediately preceding consonant, not on the vowel.)

Richard.



More information about the Unicode mailing list