Problems with org.unicode.cldr.tool.ShowKeyboards

Richard Wordingham richard.wordingham at ntlworld.com
Sun Mar 28 14:11:43 CDT 2021


The LCML specification makes using it to document keyboards seem like a
good idea.  So I have been trying to document some of my own. 

I have been having troubles devising an identifier for my X-SAMPA
keyboard.  Its purpose is that one can type in IPA using the X-SAMPA
ASCIIfication and get out IPA in Normal Form C.  It has been extended
slightly to support capital letters and other diacritics that one
encounters in transliteration.

1. My first attempt used "und-t-k0".  The tool objected that I should
rather use the language "en".  I then tried "en-t-k0", which triggered
the exception:

java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 7
	at
java.lang.String.checkBoundsBeginEnd(java.base at 9-internal/String.java:3119)
at java.lang.String.substring(java.base at 9-internal/String.java:1907) at
	org.unicode.cldr.tool.ShowKeyboards$Id.<init>(ShowKeyboards.java:811)

2. I then changed tack, and asked myself, "For which language do you
most often switch to this keyboard?".  The current answer is 'Pali', so
I tried pi-Latn-t-k0-ubuntu, pi-Latn-t-k0-ubuntu.  In each case I got a
non-terminating exception,
 "org.unicode.cldr.draft.Keyboard$KeyboardException: Bad locale tag:
pi-Latn-t-k0-ubuntu, [No minimal data for:pi_Latn]".  Are keyboards not
allowed for Pali?

3. The question of which vendor's system the keyboard is targeted at is
difficult.  It's being used on Linux, but 'debian' or 'Ubuntu' might be
a more useful answer.  The actual coding of the keyboard comes in three
flavours, Keyman for Linux (KMfL), emacs (or quail) and M17N.  KMfL is
the simplest, but only works/worked with the iBus input manager, while
M17N should work for both iBus and fcitx.  It's not at all clear how I
should reflect this in the keyboard identity.

4. The error message for loose text within elements ('PCDATA') is less
helpful than it could be.  For example,

Caused by: org.xml.sax.SAXParseException; systemId:
file:///home/richard/unicode/cldr/38/keyboards/und/pi-t-k0-ubuntu.xml;
lineNumber: 91; columnNumber: 12; The content of element type "keyMap"
must match "(map|flicks)+".

tells one (by elimination) that there is such text somewhere in an
element of type "keyMap: that ends on line 91.  That is of limited
help when an element has 1500 lines, as has happened to me.  (Being
new to the game, I had to eliminate misplaced elements or misspelt
element names - they give different errors.)  Unfortunately, this
error message seems not to be under the control of the CLDR project.

--

Despite the warning about my being wicked enough to create a Pali
keyboard, the charts and tables were produced for the keyboard.
However, there are numerous lurking issues:

5. The layout chart shows only 95 graphic symbols (including space).
Are there any plans to chart 'dead key' combinations and the like?
(This may not be a trivial exercise.)

6. Most of the keys are shown as being dead keys, though the design
intent is that they are not treated as dead keys - the 'default' option
is intended, as opposed to 'settings/transformPartial="hide"'.  The
keyboard format provides no way to note this!

7. Typing the key labelled 'A' with shift enabled is intended to
generate the character U+0251 LATIN SMALL LETTER ALPHA; only on typing
a backslash does it change to U+0041 LATIN CAPITAL LETTER A.  As the
technology used assumes a mnemonic keyboard (in so far as it doesn't
simply assume a US English keyboard), this is implemented as:

    <map iso="C01" to="A"/>
...
    <transform from="A"     to="ɑ"/>
    <transform from="A\\"   to="A"/>

The two transforms are in the 'type=simple' transforms element.

Should not the tool raise an eyebrow at this?  I feel the charts ought
to display the C01 key as producing 'ɑ'.

Even more seriously, the tool seems to deduce just from the map element
above that the keyboard can produce the letter 'A'.

8. Is there a list of available tools for capturing various keyboards
in CLDR notation, for example converting a .klc file from MSKLC and, as
more of a niche product, converting a .mim file from M17N?

If you feel tickets on CLDR should be raised, please advise how the
issues should be grouped.

Richard.



More information about the CLDR-Users mailing list