block to script

Asmus Freytag via CLDR-Users cldr-users at unicode.org
Tue Feb 13 20:55:51 CST 2018


On 2/13/2018 6:38 PM, Martin Hosken via CLDR-Users wrote:
> Dear All,
>
> Is there a way to get from a UBlockCode to a UScriptCode?
>
> What? Aargh! No! Surely not! I hear you cry. But hold on a second. What I'm wanting to do is to add some (not perfect) future proofing to my application. When a new character is added to a block in Unicode, one can infer the script of that character, even if the character itself is unknown, from the block. But blocks get split! Yes they do. And this isn't a perfect solution. But block splits are rare, and this solution will give me a much better chance of an unknown character being handled 'appropriately' than being sure that the run break will break and having to wait however long until the next version of Unicode is released, ICU is updated and the application updated to that version of ICU.
>
> Hence my question :)

Very simply count all the code points in the block that have a definite 
script assignment that's not COMMON/INHERITED (and not unassigned).

If a single script far outweighs both the COMMON/INHERITED and any other 
scripts, then "guessing" that a new character will end up with that 
script assignments will give you results that are better than "random".

And even if there is a combining mark assigned to a free spot, in many 
cases, whether you treat it as INHERITED or as having the script of its 
base character assigned to it makes no big difference (think script runs 
in a complex script).

Your algorithm will detect symbol and punctuation blocks and can predict 
COMMON as a likely script value.

Best thing is that for each  revision, your guesses will get better, 
that is, when you upgrade your application, it will improve not only 
assigned code points but the probabilistic guesses for some of the 
unassigned ones as well.

As long as you are aware that it's a probabilistic gamble, you should be 
fine.

Enjoy,

A./
>
> Yours,
> Martin
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>



More information about the CLDR-Users mailing list