Another take on the English Apostrophe in Unicode

Marcel Schneider charupdate at
Mon Jun 15 09:49:45 CDT 2015

On Fri, Jun 12, 2015, Philippe Verdy  wrote:

> These are application shortcuts, but these modifier keys combinations are used with base function keys (F1...F12), not with keys on the alphanumeric parts of the keyboard. So there's no conflict.

Thank you for your advice. It'll be very useful.
I was not precise enough, the upper row of the alphanumerical block is used with Ctrl, Shift+Ctrl, Shift+Alt by the language bar but optionally only.

> It is normal then to not assign CTRL+keys or CONTROL+shift+keys (independantly of the capslock state) with non-control characters if the same keys are used to type non-control ASCII characters in range U+0040..U+005F. This means that 32 positions on the keyboard must not be used for any assignment.
> The same remark applies to ALT+digit and ALT+letter (otherwise keyboard shortcut for application menus or navigation in web forms won't work correctly, or will take the priority when you intended to type a valid character, forcing these application functions instead of accepting your character input).
MSKLC performs this "safety checks" and will issue warnings if you do so.

The Alt shift state is unassignable in the MSKLC. When used for shortcuts with Clavier+, these are prioritized and work fine.

> This is not just "my" advaice but documented in the ISO standard.

That depends on which ISO Standard you refer to. If it's ISO/IEC 9995, then beware! IMHO this standard isn't to be taken seriously, otherwise you'll have to stay away from using the Shift + AltGr shift state, to take just one outstanding example.

> Assigning characters to positions defined for application shortcuts is a bad idea. Keyboard layouts should map characters in positions that are independant of applications (but layouts may be specific to an OS if the OS interface defines some standard shortcuts: this is a problem when using virtualized OSes, as there's a conflict with shortcuts used to switch from the guest to the host: personnally I have chosen the Application key for this instead of the right control, because the Application key is rarely needed, but I frequently type control with the right hand or two hands, notably CTRL+A, CTRL+C, CTRL+X, CTRL+V).

It's indeed very useful to keep two Control modifiers. Because the modifiers at the left and right border of the block are acted with the little finger and should thus be symetrical. This does not apply to the Alt keys and other keys more or less centered around the space bar, which are acted with the thumbs. As Alt is less used than Kana (when there is a Kana key), Kana should be on left Alt, symetrical to the (on many keyboards already implemented) AltGr key. The Alt key comes then on the Applications key, which is mnemonic because of the contextual menu icon. Internally, indeed, the Alt keys (left and right) are called Menu keys (Virtual key Left Menu or VK_LMENU, and VK_RMENU). This contextual menu is then invoked pressing the right Windows key, which is consistently missing on laptops. Laptops must however have an Applications key to prevent the AltGr key from being positioned too far rightwards, beside of a space bar too long, because this hardware layout has some negative impact on ergonomics, specialists say.
On the US keyboard layout at however, Applications is a Kana toggle, while Right Windows is a Compose key. For laptops this shifts rightwards to get Compose on Applications, and Kana toggle on, well, Right Control. Because there are laptops with nothing between Right Alt and Right Control, so I even thought at mapping the Kana toggle on Pause, but this turned out to be buggy, besides that keyboards without Applications (Menu) often are lacking the Pause key too.

> On the French keyboard, CONTROL and SHIFT+CONTROL must be reserved on 7 successive keys of the first row ("5([", "6-|", "7è`", "8_\", "9ç^", "0à@", "°)]"), they are needed to get ASCII controls
> However CONTROL+@ is extremely rarely needed in applications to enter a NULL control that will be almost always filtered out silently, only some editors that allow loading and editing binary files will use it, e.g. Emacs or Vim which have a "binary editing" mode that avoids altering the encoding of newlines, but displays all controls explicitly, and that does not limit the "line length". Personally I prefer not using text editors to edit binary files, this is too much unsafe with their "insertion" working mode, it is highly preferable and much simpler to use an hexadecimal editor).
> This means that CONTROL+"0à@" may be assigned something else more useful (even if the MSKLC compiler warns about it).
> But you can assign characters with CONTROL and CONTROL+SHIFT for the 6 other keys of the first row ("²", "1&", "2é~", "3"#", "4'{" on the left side, and "+=}" on the last position to the right).

I ended up assigning no characters on Control shift states at all any more. To get the most of a keyboard, the best is to use the Kana shift states. Their disadvantage is that the Caps Lock never can act on them. At least for me. Perhaps somebody can program a driver where it does? That would mean one should add some new attributes. BTW there are still unknown entities, like the mysterious GRPSELTAP.

> This means that CONTRL+4 can be safely assigned to U+02BC for the apostrophe letter, but the most common encoding of the French apostrophe is U+2019 (the closing single quote) as French normally does not use single quotation marks, or if it does, it cannot be followed by a letter and cannot be confused with a French apostrophe that is always followed by a letter (or number 1).

In German even less, where the single close-quote is the English open-quote, and the single open-quote looks like a comma. However, for quotations and nested quotations, the use of chevrons (angle quotation marks) is widespread. So you have U+2019 never mean anything else than an apostrophe.
The problem of shortcuts is their relative clumsiness, that is, for an apostrophe I'd prefer to hit just two keys than to press Control. Ctrl + 4 would be less ergonomical for apostrophe than to have the apostrophe on Shift, which on certain keyboards lead to typos already. We must put much more into our dead key registries. U+02BC is an example of what to add on the CIRCUMFLEX dead key.

> For now I've not seen any specific need of U+02BC in French (U+2019 is enough, even if it represents two distinct things in French, but in distinct non-colliding contexts).
> But of course U+02BC is needed for English that needs the distinction with single quotes, because the English apostrophes are used more permissively including at end of words just before a space or punctuation or end of line
> In French this is not valid to use the apostrophe for elisions at end of words, you need to use instead some abbreviation mark or style.. or no mark at all.

This is why in French there's no Apostrophe Catastrophe. Should we rely on this chance? IMO, no. Because this would lead us to:
– Avoid single quotation marks, which are very nice and useful as delimiters in texts for publishing, where U+0027 would look clumsy.
– Stay moving apostrophes to “secure” places instead of putting them properly at the beginning, like in _’Y a_.
> The French abbreviation mark can simply be a dot (same as the ASCII full stop punctuation), or writing the last letter in superscript with styles: it is highly recommended not to use any Unicode superscript letters, the only exception being the superscript letter o used to abbreviate "primo" as "1º" or "numéro" as "nº", but this letter is also missing on standard French keyboards that assign a degree symbol and many French documents are using a degree sign for "n°" and "1°" (however mechanical typewriters assigned a key for typing "Nº" as a single keystroke (where it was narrower that typing N and degree, and with the letter o generally underlined), it was on the first row, and some PC keyboards are displaying it in the shift position of the first key "²"). Underlining superscripted letters for abbreviations is deprecated in French, except for "Nº" where it is still frequently seen.
> It is no longer recommended to use any dots (or hyphens) for abbreviations (except for abbreviations using only one letter such as "M." for "monsieur") : "S.N.C.F." which was common in the 1960's and 1970's, is now just "SNCF" (and the capitalization of non-initial letters is dropped if this becomes an acronym as in "Insee", which was the ugly "I.N.S.E.E." or "I.N.S.É.É."in the 1960's; some people want also the restoration of accents when decapitalizing acronyms, so they write "Inséé"; and they also want accents on capitalized letters of non-acronym abbreviations such as "ÉAU" for the Arab Emirates in order to avoid the confusion with "EAU", the capitalization of the French word meaning water; some old abbreviations like "É.-U." for the English "U.S." are no longer used, it would become "ÉU" with the new rule and would be too much confusable with the European Union: instead we use now "US" or "USA" that have been lexicalized since long, and preferably "UE" for the European Union, but "EU" is still very common).
> The remaining cases in French are then just the elision apostrophe which only occurs between two letters, and U+2019 is now its most common encoding, generated by spell checkers (if this is not the ASCII single quote). U+02BC cannot be found anywhere (it won't make any semantic difference though and if ever spell checkers change their autocorrector to use U+02BC, no French user will really complain, provided that it is supported in the same fonts mapping U+2019; Winword knows which fonts it is using so it should not be a problem, but it should be simple to patch the spell checker so that it will accept U+02BC or U+2019 as equivalent in French to avoid unnecessary warnings, and then suggest U+02BC instead of U+2019 to replace the ASCII quote).
> Unfortunately, spell checkers in web browsers are still ignoring both U+2019 and U+02BC (e.g. Chrome, IE, Firefox... and in all Android IMEs that only propose the ASCII quote in their visual layouts... I don't know what Safari does on MacOS): they still only recognize the ASCII vertical quote, and incorrectly signal an "error" in the text editor (with red wavy underlining — which is also unnecessarily warning us almost everywhere in a way that cannot be disabled when entering texts in another language that the default locale set in the Browser, and when there's no locale selector for this spell checker enabled by default).

I agree. These spell-checkers bug me more than anything, even if they're useful.
Yes, it should be simple.
Thanks again for this useful advice. Sorry, sometimes I shifted somewhat off the topic :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list