Swapcase for Titlecase characters

Martin J. Dürst duerst at it.aoyama.ac.jp
Fri Mar 18 02:43:56 CDT 2016


I'm working on extending the case conversion methods for the programming 
language Ruby from the current ASCII only to cover all of Unicode.

Ruby comes with four methods for case conversion. Three of them, upcase, 
downcase, and capitalize, are quite clear. But we have hit a question 
for the forth method, swapcase.

What swapcase does is swap upper and lower case, so that e.g.

'Unicode Standard'.swapcase => 'uNICODE sTANDARD'

I'm not sure myself where this method is actually used, but it also 
exists in Python (and maybe Ruby got it from there).


Now the question I have is: What to do for titlecase characters? Several 
possibilities already have been floated:

a) Leave as is, because there are neither upper nor lower case.

b) Convert to upper (or lower), which may simplify implementation.

c) Decompose the character into upper and lower case components, and 
apply swapcase to these.


For example, 'Džinsi' (jeans) would become 'DžINSI' with a), 'DŽINSI' (or 
'džinsi') with b), and 'dŽINSI' with c). For another example, 'ᾨδή' would 
become 'ᾨΔΉ' with a), 'ὨΙΔΉ' (or 'ᾠΔΉ') with b), and 'ὠΙΔΉ' with c).

It looks like Python 3 (3.4.3 in my case) is doing a). My guess is that 
from an user expectation point of view, c) is best, so I'm tending to go 
for c). There is no existing data from the Unicode Standard for this, 
but it seems pretty straightforward.

But before I just implement something, I'd appreciate additional input, 
in particular from users closer to the affected language communities.

Regards,   Martin.


More information about the Unicode mailing list