Dealing with Georgian capitalization in programming languages

Markus Scherer via Unicode unicode at unicode.org
Tue Oct 2 15:12:36 CDT 2018


On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode <
unicode at unicode.org> wrote:

> ... The only
> operation that can cause problems is 'capitalize'.
>
> When I say "cause problems", I mean producing mixed-case output. I
> originally thought that 'capitalize' would be fine. It is fine for
> lowercase input: I stays lowercase because Unicode Data indicates that
> titlecase for lowercase Georgian letters is the letter itself. But it
> will produce the apparently undesirable Mixed Case for ALL UPPERCASE input.
>
> My questions here are:
> - Has this been considered when Georgian Mtavruli was discussed in the
>    UTC?
> - How have any other implementers (ICU,...) addressed this, in
>    particular the operation that's called 'capitalize' in Ruby?
>

By default, ICU toTitle() functions titlecase at word boundaries (with
adjustment) and lowercase all else.
That is, we implement Unicode chapter 3.13 Default Case Conversions R3
toTitlecase(x), except that we modified the default boundary adjustment.

You can customize the boundaries (e.g., only the start of the string).
We have options for whether and how to adjust the boundaries (e.g., adjust
to the next cased letter) and for copying, not lowercasing, the other
characters.
See C++ and Java class CaseMap and the relevant options.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20181002/dab3cefc/attachment.html>


More information about the Unicode mailing list