Support for Latin ligature IJ (was another thread)

Philippe Verdy verdy_p at wanadoo.fr
Wed Mar 30 16:42:20 CDT 2016


Note that the single letter "ij" in Dutch is often undistinctable from "ÿ",
which is also commonly found as a convenient substitute in many old
documents not encoded with Unicode but with ISO8859-1 : this has a caveat
because the capitalization would produce "Y" (in ISO8859-1), possibly
followed by a combining diaeresis (in Unicode-encoded documents) instead of
"IJ" (more correct but not perfect) or the "IJ" letter (best choice).

The use of "ÿ" in Dutch should also be considered as an orthographic fault,
and it should be corrected into "ij" (to solve the capitalization problem),
but there are occurences in Dutch of "ÿ" which is correct (notably in
borrowed French toponyms such as "L’Haÿ-les-Roses")

There may be similar examples in Belgium with French toponyms, but I
suspect that those Belgian-French toponyms have their own Dutch
"officialized" variant which would be preferable without borrowing the
Belgian-French orthography, so that they will not need "ÿ", and they will
likely use "ij" instead, meaning that the autocorrection of "ÿ" from
possible Belgian-French toponyms into "ij" will also be correct for
Dutch-Belgian toponyms ; it may also be correct for French-French toponyms
like "L’Haÿ-les-Roses" transformed into "L’Haij-les-Roses" in Belgian-Dutch,
or "L’HAIJ-LES-ROSES" if capitalized, if autocorrected this way; it would
however be incorrect to replace there the "ij" (or IJ) letter by the two
letters "ij" (or "IJ") without the orthographic ligature...

By curiosity, I looked into the Dutch Wikipedia to see how they wrote
"L’Haÿ-les-Roses"
and they don't transform the French "ÿ" into some Dutch "ij" (and they don't
have any other "officialized" Dutch orthography.

For this reason, the autocorrection of the "ÿ" letter into the "ij" letter
in Dutch is disabled by default (even if it would be needed to look into
old documents encoded with ISO8859-1).

The situation is more complex for the autocorrection of the "ij" digram
(extremely frequent in old documents encoded with ISO8859-1) into the plain
"ij" letter, which seems to be active in various wordprocessors (but which
causes problems with borrowed non-Dutch names).


2016-03-30 23:19 GMT+02:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> In my opinion, the Dutch IJ/ij "ligature" is not really a ligature and
> should be treated exactly like Æ/æ or Œ/œ as a plain single letter.
>
> The use of IJ/ij (encoded as separate letters) is a actually an
> orthographic fault, that a ligature will not help resolve.
>
> Thanks, the decomposition of the "IJ" letter or "ij" into separate letters
> is only a compatibility decomposition, but it is not canonically equivalent.
>
> In such as case, the "ij" letter is soft-dotted also in Dutch and the two
> dots disappear when it has diacritics above.
>
> For Lithuanian, the "ij" letter is not soft-dotted, but effectively
> hard-coded (meaning also that it is really a ligature, and that the
> single-letter should not be used at all, but encoded as i+j with a possible
> joiner...). In such a case, using the single letter "IJ/ij" meant only for
> Dutch is also an orthographic fault. But this also means that when you add
> diacritics in Lithuanian, you'll need to encode explicit dots (like in
> Turkish) to keep these dots !
>
>
> 2016-03-30 22:12 GMT+02:00 Marcel Schneider <charupdate at orange.fr>:
>
>> On Wed, 30 Mar 2016 00:14:59 +0100, Kent Karlsson  wrote [in the thread
>> “Re: Swapcase for Titlecase characters”]:
>>
>> […]
>>
>> > I still think ij should have the "soft-dotted" property (and that
>> > that property is finally implemented properly in various systems...).
>>
>> [Refers to:
>> Re: Case for letters j and J with acute from Kent Karlsson on 2016-02-09
>> http://www.unicode.org/mail-arch/unicode-ml/y2016-m02/0044.html]
>>
>> For ‘ij’ that may be unambiguous, but for ‘i’ there is a need of
>> locale-dependent tailoring, as for Lithuanian it should be hard-dotted.
>>
>> > I've heard that old typewriters used to have a key for IJ ij.
>>
>> Iʼve read it on Wikipedia, though Iʼve been unable to grab any image of
>> such off the internet.
>> This one is Dutch but has none:
>>
>>
>> https://www.bing.com/images/search?q=typewriter+dutch&view=detailv2&id=5473CA1D2B05879CE21B98CD9F729EE838A49E69&selectedindex=31&ccid=wLABJru4&simid=608029570327776271&thid=OIP.Mc0b00126bbb87be9b1d849df9b11a201o0&mode=overlay&first=1
>>
>> These machines have lowercase ij only, while the uppercase position is
>> given the florin sign:
>>
>> https://img1.etsystatic.com/062/0/5543707/il_570xN.794019731_fiyd.jpg
>>
>>
>> http://www.tiptopvintage.co.uk/wp-content/uploads/2015/05/Brown-Vendex-Typewriter-7.jpg
>>
>> > Maybe it should be reintroduced for Dutch computer keyboards,
>>
>> I pledge in favor. To achieve this, it would be sufficient to have an
>> ISO/IEC 9995-3 compliant keyboard layout for the Netherlands—and one for
>> Belgium, as there are already one for Canada and one for Germany (given
>> that ‘IJ’, ‘ij’ are included on T3).
>>
>> And to complete the job, all of these could be added to CLDR.
>>
>> > as well as used
>> > (for Dutch) in autocorrects (IJ -> IJ, ij -> ij) or spell correctors
>> > (looking at the whole word rather than just two letters, and then
>> > not restricted to Dutch per se, but certain Dutch names regardless
>> > of the language for the surrounding text).
>>
>> Itʼs urgent to spell the names correctly, notably because there are
>> insufficient equivalence classes in search engines. Correctly spelled
>> ‘IJsselmeer’ vs missspelled ‘IJsselmeer’ points to different numbers of
>> results:
>>
>> Bing Search: 2 850 000 vs 886 000
>> Google Search: 343 000 vs 345 000
>>
>> while DuckDuckGo, Startpage and Yahoo do not state the number of results
>> (that in any case is mainly theoretical since only the top 500 ones are
>> currently displayable).
>>
>> > That, in turn, would
>> > probably be a better approach than trying to have some special
>> > handling of the sequence "ij" in case mapping (for Dutch alone).
>>
>> In current understanding there seems to be a flaw on whether the ‘IJ’
>> ligatures are to be used, or are deprecated. The mere fact that they are
>> compatibility decomposable is cited[1] along with rule D21 to justify
>> separate encoding as ‘IJ’. TUS indeed seems to support that POV when it
>> declares Dutch as supported by the Latin-1 supplement. One page below, the
>> ‘IJ’ ligatures are discussed as compatibility characters, which does not
>> imply deprecation. And indeed, their replacement by two-letter sequences is
>> pointed as a mere matter of fact.
>>
>> While atomic typing of ‘ij’ seems to be a relict from the ISO/IEC 646
>> era, I’m puzzled not to find any related autocorrect in word processor when
>> Dutch is on (no instances found in MSO1043.acl of 2010), whereas French ‘œ’
>> is supported in the French ACL.
>>
>> As of special case mapping for ‘ij’, its implementation goes increasing,
>> but yes it remains a workaround that wonʼt be needed any longer as soon as
>> people switch to ISO/IEC 9995-3 keyboard layouts. In the era of
>> globalization, there is pretty no other choice.
>>
>> Hopefully,
>>
>> Marcel
>>
>> [1] https://en.wikipedia.org/wiki/IJ_(digraph)#cite_note-15
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160330/c9470da6/attachment.html>


More information about the Unicode mailing list