Aw: Re: Dealing with Georgian capitalization in programming languages

Marius Spix via Unicode unicode at unicode.org
Tue Oct 9 03:22:25 CDT 2018


The capital ẞ (U+1E9E) has been officially approved by the Council for the German Language since July 2018. However, there is no word starting with ß, that means the character is only relevant for full-capitalized words. It may only stand alone in spaced type, when there is no available italic font-style.

In the Ruby bug tracker that there is also an issue with Dutch ij → IJ. The dedicated ligatures IJ (U+0133) and ij (U+0133) are not recommended and thus never used, but leading ij must always be capitalized to IJ, as in IJSBERG → ijsberg → IJsberg. The actual problem is that the current capitalization algorithm is based on a regular grammar (type 3). It has to be adjusted for a context-sensitive (type 1) grammar. 

Regards,

Marius

 

On 2018/10/09 09:47, Martin J. Dürst wrote:

> I have been thinking through this. It seems quite appealing.
> 
> But I'm concerned there may be some edge cases. I have been able to come
> up with two so far:
> 
> - Applying this to a string starting with upper-case SZ (U+1E9E).
> This may change SZ → ß → Ss.
> - Using the 'capitalize' method to (try to) get the titlecase
> property of a MTAVRULI character. (There's no other way
> currently in Ruby to get the titlecase property.)
> 
> There may be others. If you have some ideas, I'd appreciate to know
> about them.
> 
> This lets me wonder why the UTC didn't simply declare the titlecase
> property of MTAVRULI to be mkhedruli. Was this considered or not? The
> way things are currently set up, there seems to be no benefit of
> MTAVRULI being its own titlecase, because in actual use, that requires
> additional processing.
> 
> Regards, Martin.



More information about the Unicode mailing list