"textels" (was: Default character encoding for each operating system?)

Janusz S. Bień jsbien at mimuw.edu.pl
Thu Sep 15 14:12:53 CDT 2016


On Thu, Sep 15 2016 at 16:36 CEST, john.w.kennedy at gmail.com writes:

[...]

> In the new Swift programming language, which is white-hot in the Apple
> community, Apple is moving toward a model of a transparent, generic
> Unicode that can be “viewed” as UTF-8, UTF-16, or UTF-32 if necessary,
> but in which a “character” contains however many code points it needs
> (“e” with a stacked macron, acute accent, and dieresis is
> algorithmically one “character” in Swift). Moreover,
> e-with-an-acute-accent and e followed by a combining acute accent, for
> example, compare as equal. At present, the underlying code is still
> UTF-16LE.

For several years I use the name "textel" (text element, in Polish
"tekstel") for such objects. I do it mostly orally in my presentations
for my students, but I used it also in writing e.g. in
http://bc.klf.uw.edu.pl/118/, unfortunately without a proper
definition. A rudymentary definition was provided for me only in my
recent paper in Polish: http://bc.klf.uw.edu.pl/480/. It states simply
(on p. 69) "an elementary text element independently of its Unicode
representation" (meaning in particular composed vs precomposed). I still
hope to formulate sooner or later a more satisfactory definition :-)

I think Swift confirms that such a notion is really needed.

Best regards

Janusz

-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



More information about the Unicode mailing list