Meroitic cursive fractions numerical values

Karl Williamson public at khwilliamson.com
Mon Mar 30 14:38:54 CDT 2015


On 03/29/2015 03:41 AM, Andrew West wrote:
> On 28 March 2015 at 20:05, Karl Williamson <public at khwilliamson.com> wrote:
>>
>> Existing software that looks at the numeric values of characters is written
>> expecting that rational numbers will have been reduced to their lowest form.
>
> That seems to be a rather rash statement. I have software (BabelPad)
> which parses the numeric values of characters for numeric sorting
> purposes, and it parses "6/12" for MEROITIC CURSIVE FRACTION SIX
> TWELFTHS as 0.5. Personally I find it hard to imagine how you could
> write software that accepts "6/12" as input and is unable to come up
> with the answer of a half.

The statement is not rash, as it is simply a statement of objective 
fact.  I am the maintainer of software that fails with beta 8.0 due to 
this change.  And it has nothing to do with not being able to do 
arithmetic division; your assumption was wrong.

The software essentially creates a database of Unicode properties for 
regular expression pattern matching. so that someone can say

   /\p{Numeric_Value=0.5}/

and quickly determine if the matched string contains a code point with 
that characteristic.  Because the database is copied as-is to many 
different computers with different word sizes and different floating 
point implementations, it can't do the division ahead of time because of 
the inherent fuzziness of floating point numbers.  It solves this the 
same way Unicode has, by leaving rational numbers in their original 
precisely specified format.  Thus it creates a table for the 
property-value combination of Numeric_Value and 1/2, taking the UCD 
value as-is.

Prior to beta 8, the UCD came with all fractions already reduced.  It 
would not occur to someone with a mainly mathematical or computer 
science background that the input data would come otherwise, as the 
mathematical convention is to specify in irreducible terms, even though 
this isn't promised by Unicode, so of course there is no code to handle 
the new case.  The code thus creates a second table for the 
property-value combination of Numeric_Value and 6/12, which causes problems.

It's a small matter to add code to reduce the UCD-specified rational 
numbers, but it's just one more complication to have to deal with along 
with the many that the UCD already presents, and if there is not a good 
reason the data for these new characters is specified contrary to 
mathematical convention, then the data should be changed instead of 
having to code around it.
>
> I would say that fractions should not be reduced to their lowest form
> in the Unicode data as some people may need to order fractions by
> numerator or denominator, and reducing to lowest form could break the
> expectations of some software.  Having said that, I note that the
> numeric value of one character has been reduced in the Unicode data:
> U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of "0"
> rather that "0/3".

So there is some precedent for reducing.


>
> Andrew
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>



More information about the Unicode mailing list