doug at ewellic.org
Sun Mar 15 17:50:13 CDT 2015
Luke Dashjr <luke at dashjr dot org> wrote:
> That is, 100 decimal is "one hundred" with a binary value of 110 0100.
> But the same "100" in tonal would be "san" with a binary value of
> 1 0000 0000.
"100" with the meaning of "one hundred" is spoken as "ciento" in
Spanish, "ekatón" in Greek, "sto" in Russian, etc. So pronunciation by
itself doesn't necessarily justify separate encoding.
Within English-speaking contexts, "100" can also be a binary number, or
an octal number with a binary value of 100 0000. In my world as a
developer, it's often a hex number, as in tonal. In most of these cases
it's typically pronounced "one zero zero" or "one oh oh." So the numeric
value of a string of digits within a positional system also doesn't
necessarily justify separate encoding.
TTS systems always have to rely on environmental hints. Anyone who has
worked on them will agree.
> And in the other example, one is "B with double lines" vs "bitcoins".
As David pointed out, currency symbols really aren’t an analogy to
anything else. They are never built from combining characters, and are
never decomposable to them. This has nothing really to do with TTS or
pronunciation. One person in the Ubuntu thread mentioned that, but that
is not the primary reason.
Doug Ewell | http://ewellic.org | Thornton, CO
More information about the Unicode