Another take on the English Apostrophe in Unicode

Philippe Verdy verdy_p at wanadoo.fr
Mon Jun 15 03:11:34 CDT 2015


2015-06-15 8:23 GMT+02:00 Marcel Schneider <charupdate at orange.fr>:

> On Fri, Jun 12, 2015, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> Even the Language bar uses the upper row to define shortcuts with Control,
> Shift+Control, Shift+Alt to switch between keyboard layouts, which are
> prioritized.
>
These are application shortcuts, but these modifier keys combinations are
used with base function keys (F1...F12), not with keys on the alphanumeric
parts of the keyboard. So there's no conflict.

It is normal then to not assign CTRL+keys or CONTROL+shift+keys
(independantly of the capslock state) with non-control characters if the
same keys are used to type non-control ASCII characters in range
U+0040..U+005F. This means that 32 positions on the keyboard must not be
used for any assignment.

The same remark applies to ALT+digit and ALT+letter (otherwise keyboard
shortcut for application menus or navigation in web forms won't work
correctly, or will take the priority when you intended to type a valid
character, forcing these application functions instead of accepting your
character input).

MSKLC performs this "safety checks" and will issue warnings if you do so.

This is not just "my" advaice but documented in the ISO standard.

> So to test the shortcuts with Clavier+, I must first remove shortcuts in
> the Language bar. Then the way was free to test Mr Overingtonʼs shortcuts
> for curly apostrophes (I will send the result just after). When I deleted
> the shortcuts in Clavier+ to test your advice, I found no application
> shortcuts for Ctrl+4 while the keys 1, 2, 5 and 0 are usually mapped as
> Word shortcut with CONTROL, while the heading formatting is with ALT. But
> indeed among ASCII controls I found eight on the French keyboard:
>
>
> //VirtualKey |ScanCd |ISO_# |Ctrl
> {VK_ESCAPE /*T01 */ ,0x001b
> {VK_CANCEL /*X46 */ ,0x0003
> {VK_BACK /*T0E E13*/ ,0x007f
> {VK_OEM_6 /*T1A D11*/ ,0x001b
> {VK_OEM_1 /*T1B D12*/ ,0x001d
> {VK_OEM_5 /*T2B C12*/ ,0x001c
> {VK_RETURN /*T1C C13*/ ,'\n'
> {VK_OEM_102 /*T56 B00*/ ,0x001c
>
> On the alphanumerical block, there are always the same five, three among
> them near the Enter key. The British-American Apostrophe key is exempt of
> Controls too. This is probably why Mr Overington wants to use CONTROL and
> SHIFT+CONTROL for U+2019 and U+02BC, as custom applications shortcuts.
>
Assigning characters to positions defined for application shortcuts is a
bad idea. Keyboard layouts should map characters in positions that are
independant of applications (but layouts may be specific to an OS if the OS
interface defines some standard shortcuts: this is a problem when using
virtualized OSes, as there's a conflict with shortcuts used to switch from
the guest to the host: personnally I have chosen the Application key for
this instead of the right control, because the Application key is rarely
needed, but I frequently type control with the right hand or two hands,
notably CTRL+A, CTRL+C, CTRL+X, CTRL+V).

On the French keyboard, CONTROL and SHIFT+CONTROL must be reserved on 7
successive keys of the first row ("5([", "6-|", "7è`", "8_\", "9ç^", "0à@",
"°)]"), they are needed to get ASCII controls

However CONTROL+@ is extremely rarely needed in applications to enter a
NULL control that will be almost always filtered out silently, only some
editors that allow loading and editing binary files will use it, e.g. Emacs
or Vim which have a "binary editing" mode that avoids altering the encoding
of newlines, but displays all controls explicitly, and that does not limit
the "line length". Personally I prefer not using text editors to edit
binary files, this is too much unsafe with their "insertion" working mode,
it is highly preferable and much simpler to use an hexadecimal editor).
This means that CONTROL+"0à@" may be assigned something else more useful
(even if the MSKLC compiler warns about it).

But you can assign characters with CONTROL and CONTROL+SHIFT for the 6
other keys of the first row ("²", "1&", "2é~", "3"#", "4'{" on the left
side, and "+=}" on the last position to the right).

This means that CONTRL+4 can be safely assigned to U+02BC for the
apostrophe letter, but the most common encoding of the French apostrophe is
U+2019 (the closing single quote) as French normally does not use single
quotation marks, or if it does, it cannot be followed by a letter and
cannot be confused with a French apostrophe that is always followed by a
letter (or number 1).

----

For now I've not seen any specific need of U+02BC in French (U+2019 is
enough, even if it represents two distinct things in French, but in
distinct non-colliding contexts).

But of course U+02BC is needed for English that needs the distinction with
single quotes, because the English apostrophes are used more permissively
including at end of words just before a space or punctuation or end of line

In French this is not valid to use the apostrophe for elisions at end of
words, you need to use instead some abbreviation mark or style.. or no mark
at all.

----

The French abbreviation mark can simply be a dot (same as the ASCII full
stop punctuation), or writing the last letter in superscript with styles:
it is highly recommended not to use any Unicode superscript letters, the
only exception being the superscript letter o used to abbreviate "primo" as
"1º" or "numéro" as "nº", but this letter is also missing on standard
French keyboards that assign a degree symbol and many French documents are
using a degree sign for "n°" and "1°" (however mechanical typewriters
assigned a key for typing "Nº" as a single keystroke (where it was narrower
that typing N and degree, and with the letter o generally underlined), it
was on the first row, and some PC keyboards are displaying it in the shift
position of the first key "²"). Underlining superscripted letters for
abbreviations is deprecated in French, except for "Nº" where it is still
frequently seen.

It is no longer recommended to use any dots (or hyphens) for abbreviations
(except for abbreviations using only one letter such as "M." for
"monsieur") : "S.N.C.F." which was common in the 1960's and 1970's, is now
just "SNCF" (and the capitalization of non-initial letters is dropped if
this becomes an acronym as in "Insee", which was the ugly "I.N.S.E.E." or
"I.N.S.É.É."in the 1960's; some people want also the restoration of accents
when decapitalizing acronyms, so they write "Inséé"; and they also want
accents on capitalized letters of non-acronym abbreviations such as "ÉAU"
for the Arab Emirates in order to avoid the confusion with "EAU", the
capitalization of the French word meaning water; some old abbreviations
like "É.-U." for the English "U.S." are no longer used, it would become
"ÉU" with the new rule and would be too much confusable with the European
Union: instead we use now "US" or "USA" that have been lexicalized since
long, and preferably "UE" for the European Union, but "EU" is still very
common).

----

The remaining cases in French are then just the elision apostrophe which
only occurs between two letters, and U+2019 is now its most common
encoding, generated by spell checkers (if this is not the ASCII single
quote). U+02BC cannot be found anywhere (it won't make any semantic
difference though and if ever spell checkers change their autocorrector to
use U+02BC, no French user will really complain, provided that it
is supported in the same fonts mapping U+2019; Winword knows which fonts it
is using so it should not be a problem, but it should be simple to patch
the spell checker so that it will accept U+02BC or U+2019 as equivalent in
French to avoid unnecessary warnings, and then suggest U+02BC instead of
U+2019 to replace the ASCII quote).

Unfortunately, spell checkers in web browsers are still ignoring both
U+2019 and U+02BC (e.g. Chrome, IE, Firefox... and in all Android IMEs that
only propose the ASCII quote in their visual layouts... I don't know what
Safari does on MacOS): they still only recognize the ASCII vertical quote,
and incorrectly signal an "error" in the text editor (with red wavy
underlining — which is also unnecessarily warning us almost everywhere in a
way that cannot be disabled when entering texts in another language that
the default locale set in the Browser, and when there's no locale selector
for this spell checker enabled by default).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150615/5f5a350d/attachment.html>


More information about the Unicode mailing list