0027, 02BC, 2019, or a new character?
Asmus Freytag (c) via Unicode
unicode at unicode.org
Fri Jan 19 14:08:23 CST 2018
On 1/19/2018 5:37 AM, Philippe Verdy wrote:
> May be the IDN could accept a new combining diacritic (sort of
> right-side acute accent). After all the Kazakh intent is not to define
> a new separate character but a modification of base letter to create a
> single letter in their alphabet.
> So a proposal for COMBINING APOSTROPHE (whose spacing non-combining
> version is 02BC), so that SPACE+COMBINING APOSTROPHE will render
> exactly like 02BC.
In the case of TLD IDNs what is at issue is the fact that it "renders
exactly like" 02BC (which renders exactly like 2019).
You can see the issue when you look at Andre's twitter tags: you can
create two strings that look the same, but the part that is a hashtag is
different. That is deemed an unacceptable security risk for TLD IDNs.
If you encoded such a combining character, it would also not be eligible
for TLD IDNs.
> 2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode
> <unicode at unicode.org <mailto:unicode at unicode.org>>:
> Top level IDN domain names can not contain 02BC, nor 0027 or 2019.
> (RFC 6912 gives the rationale and RZ-LGR the implementation, see
> MSR-3 <https://www.icann.org/public-comments/msr-3-2018-01-17-en>)
> On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote:
>>> On 18 Jan 2018, at 08:21, Andre Schappo via Unicode
>>> <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>>>> On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode
>>>> <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>>>> On Mon, 15 Jan 2018 20:16:21 -0800
>>>> James Kass via Unicode <unicode at unicode.org
>>>> <mailto:unicode at unicode.org>> wrote:
>>>>> It will probably be the ASCII apostrophe. The stated intent favors
>>>>> the apostrophe over diacritics or special characters to ensure
>>>>> the language can be input to computers with standard keyboards.
>>>> Typing U+0027 into a word processor takes planning. Of the
>>>> three, it
>>>> should obviously be the modifier letter U+02BC, but I think
>>>> what gets
>>>> stored will be U+0027 or the single quotation mark U+2019.
>>>> However, we shouldn't overlook the diacritic mark U+0315
>>>> COMBINING COMMA
>>>> ABOVE RIGHT.
>>> I have just tested twitter hashtags and as one would expect,
>>> U+02BC does not break hashtags. See
>> ...and, just in case
>> André Schappo
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode