From rick at unicode.org  Thu Jan  2 12:18:35 2014
From: rick at unicode.org (Rick McGowan)
Date: Thu, 02 Jan 2014 10:18:35 -0800
Subject: Mail list changes for 2014
In-Reply-To: <52C2F57A.2020108@unicode.org>
References: <529E619E.7030305@unicode.org> <52C2F57A.2020108@unicode.org>
Message-ID: <52C5AD7B.9020000@unicode.org>

Hello everyone.

The Unicode mail list has now been re-activated. If you experience 
trouble with subscription issues or functionality, please feel free to 
e-mail me directly.

Regards,
     Rick


On 12/31/2013 8:48 AM, Rick McGowan wrote:
> The mail list will now go off-line shortly, and be back after the new 
> year.
> Regards,
>     Rick
>
> On 12/3/2013 2:56 PM, Rick McGowan wrote:
>> At the end of the year, we will be changing the mail list server for 
>> the public-access mail lists, including this one. The new system will 
>> be Gnu "Mailman", an interface familiar to many. This should make it 
>> easier for users to handle their subscriptions and options in one 
>> place, via the web interface.
>>
>> We will thus be shutting down the public mail lists over the "holiday 
>> break" in the final days of 2013, and re-open with the new system in 
>> January 2014.
>>
>> Affected mail lists are those listed on the Mail Lists page here:
>>     http://www.unicode.org/consortium/distlist.html
>> including Unicode, CLDR-Users, ULI-Users, and Indic.
>>
>> The new mail list system is documented here: 
>> http://www.gnu.org/software/mailman/
>>
>


From richard.wordingham at ntlworld.com  Sun Jan  5 18:11:03 2014
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 6 Jan 2014 00:11:03 +0000
Subject: Codepoint Support for Phonetically-Aware Collation
Message-ID: <20140106001103.3356960c@JRWUBU2>

Several languages with phonetically ambiguous spelling take
pronunciation into account when sorting words alphabetically.  Typical
examples are Welsh and Slovak, where contractions are not applied for
chance combinations of characters ('ng' in Welsh and 'ch' in Slovak).
Less typically, visually opaque syllable boundaries are taken into
account, e.g. in Lao and in some older Thai dictionaries (though the
Thai examples I know of were compiled by Europeans).

There are two approaches to these ambiguities for correct automated
collation. One can either use a vocabulary-based collation table (as is
done for Tibetan-script languages) or use mark-up characters such as
U+00AD SOFT HYPHEN, U+200B ZERO WIDTH SPACE or U+034F COMBINING
GRAPHEME JOINER (CGJ) as appropriate to prevent contractions in
collation.  In the latter case, it is reasonable to assume that such
characters will only be used when it is likely that the text will be
subject to culturally-sensitive sorting. For example, the 'search'
collation settings for Welsh in the CLDR do not use the contractions
used for sorting Welsh, so one does not have to worry about the encoding
of the town name 'Bangor' unless it will be presented in an index in
Welsh - in which case Welsh inflections will be a greater source of
trouble.

CGJ may also used to distinguish umlaut and diaeresis (both usually
encoded U+0308) in German, by encoding the diaeresis as <U+034F,
U+0308>.

In some SE Asian dictionaries, an ordering distinction is made
between the use of the letter corresponding to Indic PA to represent a
voiced sound similar to /b/, used for native words, and the
unvoiced sound /p/, used in Indic loan words.  The examples I know of
are U+1794 KHMER LETTER BA and U+1A37 TAI THAM LETTER BA.  While it is
possible to represent the contrasting sound /p/ by <U+1794, U+17C9
KHMER SIGN MUUSIKATOAN> or U+1A38 TAI THAM LETTER HIGH PA respectively
instead, in many Indic loan words this is not done.  Is there any
encoding level mark-up available to distinguish between the two
pronunciations of BA when necessary?  I had thought the problem had
been solved for Khmer, but I can now find no evidence of a solution.

The usage of the two scripts share the feature that as the first
element of what is or was a true consonant cluster, BA usually (always?)
has an unvoiced sound, not the voiced sound.  (Sound changes have
made the situation more complicated to describe in Tai Lue, Tai Khuen
and Northern Thai, but the principle remains unchanged.)  This
complicates the use of what to me had seemed obvious, namely to use
<BA, CGJ> to represent the unvoiced sound.  It would be more natural to
use <BA, CGJ, COENG/SAKOT> to indicate the voiced sound should it
appear in clusters in foreign loanwords.

Richard.


From naenaguru at gmail.com  Wed Jan  8 22:43:38 2014
From: naenaguru at gmail.com (Naena Guru)
Date: Thu, 9 Jan 2014 10:13:38 +0530
Subject: interaction of Arabic ligatures with vowel marks
In-Reply-To: <51B7E66B.1050101@gmail.com>
References: <51B7E66B.1050101@gmail.com>
Message-ID: <CAHK3Hy0C2D=VGFPUsFYCGHd7BepjuV4MB+Mswghp0ufwBiqHmA@mail.gmail.com>

Please see this page: (for IE, use v 2010 and up)
http://lovatasinhala.com/

The font is almost all ligatures. If you copy and inspect the text, you'll
notice that it is simple romanized Singhala. I am currently in Sri Lanka
demonstrating this. The people at president's office and one of the
powerful ministers have seen it. They are elated that after all, Singhala,
the most complex of 'Abigudas' is much like a Western European language and
amazingly computer and user friendly. This is contrary to how it was
portrayed to them by local academics and technocrats causing the poor
country unnecessary debt.

The ideas of Abiguda and Complex fade away if a font is made fully
understanding Unicode's description of ligatures and how they are
implemented by OpenType (now OpenFont). I believe that Arabic and Hebrew
can follow this model so that typing the script is simplified for users
without compromising orthography.


On Wed, Jun 12, 2013 at 8:39 AM, Stephan Stiller
<stephan.stiller at gmail.com>wrote:

> Hi,
>
> How is the placement of vowel marks around ligatures handled in Arabic
> text?
>
> Does anyone have good pointers on this topic?
>
> My guess is that this does not come up often (just like the topic of
> pointing for handwritten Hebrew), as vowel marks are mostly not added in
> ordinary text. Nonetheless, any text making heavy use of ligatures will
> from time to time need to add vowel marks for a foreign name or as a
> reading aid, and (as many of us know) the Quran is traditionally printed
> with vowel marks.
>
> I'm also wondering how font designers normally handle this. I think there
> are analogous questions for various ligature-heavy abugidas, so there must
> be an existing body of knowledge. There should be better answers than
> "squeeze the vowels around the consonant clusters in whatever way seems
> most intuitive". Do traditional printing presses use extra metal types for
> such glyph clusters, or do they manually add and adjust the positioning of
> vowels?
>
> Stephan
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140109/b6f88e18/attachment.html>

From pravin.d.s at gmail.com  Fri Jan 10 04:15:00 2014
From: pravin.d.s at gmail.com (pravin.d.s at gmail.com)
Date: Fri, 10 Jan 2014 15:45:00 +0530
Subject: Handling Malayalam "NTA" issue for Lohit2
Message-ID: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>

Hi All,

    We are working on lohit2[1] project, whose plan is to create standard
and reusable open type tables with additional improvement. Lohit as a
default system fonts in most of the open source distros always follow
standard around language technology. (Font specification, Storage,
Guideline related to Languages)

    Recently we started working on Lohit Malayalam font [2] with some
planned improvement and came across couple of bugs related [3][4] with well
know "NTA" issue introduced during the addition of Atomic chillu characters
in Unicode 5.1

    Now dilemma is number of users already using

*     A. u0D28 + u0D4D + u0D31 for getting NTA character even before
Unicode 5.1 *


*     B. But Unicode from 5.1 onward says (TUS 6.2 chapter 9.9 p 321) use
        u0D7B + u0D4D + u0D31 for getting same "NTA" *
    In my humble opinion here one thing is very clear that Unicode forgot
to add normalization (backward compatibility) for newly added sequence in
(B). Still i have not seen any improvement in it from long time.

    Now dilemma with lohit2 development is

    - Lohit 1 is supporting sequence (A) from long time (even before
Unicode 5.1), so for the backward compatibility lohit2 should support the
same.

    - Since Lohit follows standards, it is important to support sequence
(B) for following Unicode 6.3. But following Unicode 6.3 in this case
clearly invites dual encoding without any normalization rules handy.

    Good documentation on NTA issues is available at [5]

    Presently i am in favour of not supporting Unicode defined sequence (B)
in lohit2 and keep on using (A) which is used in Lohit fonts family from
long time.

    Please let me know your view on it. Is there any chance of getting this
mention in Unicode chapter 9? is there any chance of Normalization rule for
this?


Regards,
Pravin Satpute


1.
http://pravin-s.blogspot.in/2013/08/project-creating-standard-and-reusable.html
2.
http://pravin-s.blogspot.in/2013/12/lohit2-lohit-malayalam-development-plans.html
3. https://bugzilla.redhat.com/show_bug.cgi?id=1016984
4. https://bugzilla.redhat.com/show_bug.cgi?id=1016989
5. http://thottingal.in/documents/Malayalam-NTA.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140110/7f0c7d13/attachment.html>

From samjnaa at gmail.com  Fri Jan 10 06:24:46 2014
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Fri, 10 Jan 2014 17:54:46 +0530
Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2
In-Reply-To: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
References: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
Message-ID: <CAH-HCWVX3AjnR9=qR6ai02sYh7TWVHLGxKvVYuKPDitP7UJqcw@mail.gmail.com>

On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com
<pravin.d.s at gmail.com> wrote:
>     In my humble opinion here one thing is very clear that Unicode forgot to
> add normalization (backward compatibility) for newly added sequence in (B).

Dear Pravin,

If by normalization you mean
http://www.unicode.org/glossary/#normalization -- then it is not
possible in this case since the individually encoded chillus do not
have canonical decomposition to their related consonants. Indeed, that
would defeat the purpose of the separate encoding, which was to
provide semantically distinct chillus!

The recent additional chillus trickling into the standard seems to
indicate that one should have encoded a CHILLU MARKER back then, but
there's no going back now, so chillus galore! ;-)

On a more serious note, I think it is important to adhere to the
standard, as it is good for you in the long run even though it is
difficult at first. If you delay the adoption of the standard, it only
gets all the harder as time passes, since in the interim even more
people continue to assume the old behaviour...

-- 
Shriramana Sharma ???????????? ????????????


From paivakil at gmail.com  Fri Jan 10 11:46:30 2014
From: paivakil at gmail.com (Mahesh T. Pai)
Date: Fri, 10 Jan 2014 23:16:30 +0530
Subject: Handling Malayalam "NTA" issue for Lohit2
In-Reply-To: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
References: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
Message-ID: <20140110174630.GA18104@localhost>

pravin.d.s at gmail.com said on Fri, Jan 10, 2014 at 03:45:00PM +0530,:
    - Lohit 1 is supporting sequence (A) from long time (even before
 > Unicode 5.1), so for the backward compatibility lohit2 should support the
 > same.
 > 

I believe thet the UTC wanted to maintain compatibility with some
_beta_ version of Microsoft's some software in making the choice that
it did regarding the /nta/ sequence. 


 >     Presently i am in favour of not supporting Unicode defined
 > sequence (B) in lohit2 and keep on using (A) which is used in Lohit
 > fonts family from long time.

Allow me to go on a nostalgia trip. Almost a decade back, the then SMC
team came accross what was obvious lack of clarity in the UTS. They
decided, against my advise, to follow the suggestions in OpenType
definition. To be fair, then, I had no alternative to offer, except
not to implement the suggestion in the OpenType pages. Microsoft
ultimately waited for some clarity in the UTS before implementing
anything. and the communimity efforts went (mostly) in vain. 

Right now, given a choice between supporting legacy data and
standards, I will choose the latter, with some kind of jugaad based on
the PUA / glyph name to enable support for legacy data. 

Not the ideal situation, but when politics get the uppoer hand over
merits, efficiency and appropriateness always takes a backseat. 

-- 
Mahesh T. Pai   ||
free -  (adj) able to  act at will;  not hampered;
       not  under  compulsion  or restraint;  free
       from  obligations or  duties; not  bound to
       servitude; at liberty.


From pravin.d.s at gmail.com  Mon Jan 13 00:04:33 2014
From: pravin.d.s at gmail.com (pravin.d.s at gmail.com)
Date: Mon, 13 Jan 2014 11:34:33 +0530
Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2
In-Reply-To: <CAH-HCWVX3AjnR9=qR6ai02sYh7TWVHLGxKvVYuKPDitP7UJqcw@mail.gmail.com>
References: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
 <CAH-HCWVX3AjnR9=qR6ai02sYh7TWVHLGxKvVYuKPDitP7UJqcw@mail.gmail.com>
Message-ID: <CALuKHAcGNpqYLdurbWnnqiS+5=2APinPbKWS4Hs+Lt994qgyeQ@mail.gmail.com>

On 10 January 2014 17:54, Shriramana Sharma <samjnaa at gmail.com> wrote:

> On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com
> <pravin.d.s at gmail.com> wrote:
> >     In my humble opinion here one thing is very clear that Unicode
> forgot to
> > add normalization (backward compatibility) for newly added sequence in
> (B).
>
> Dear Pravin,
>
> If by normalization you mean
> http://www.unicode.org/glossary/#normalization -- then it is not
> possible in this case since the individually encoded chillus do not
> have canonical decomposition to their related consonants. Indeed, that
> would defeat the purpose of the separate encoding, which was to
> provide semantically distinct chillus!
>

Ok not normalization but at least Unicode should mention old habit of
writing NTA and new with addition of atomic chillu. It will definitely help
people working on NLP to handle data having these two different sequence.


>
> On a more serious note, I think it is important to adhere to the
> standard, as it is good for you in the long run even though it is
> difficult at first. If you delay the adoption of the standard, it only
> gets all the harder as time passes, since in the interim even more
> people continue to assume the old behaviour...
>

>From font perspective if we consider there is NTA sequence is available in
both form (A) & (B) in data around. We have to add required rules for both
way. Unfortunately in this case Unicode has not consider for backward
compatibility but at least Lohit project definitely consider it.

So to be in safer side now i am fever of having both rules in font.

Regards,
Pravin Satpute
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140113/9f0cae58/attachment.html>

From pravin.d.s at gmail.com  Mon Jan 13 00:28:52 2014
From: pravin.d.s at gmail.com (pravin.d.s at gmail.com)
Date: Mon, 13 Jan 2014 11:58:52 +0530
Subject: Handling Malayalam "NTA" issue for Lohit2
In-Reply-To: <20140110174630.GA18104@localhost>
References: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
 <20140110174630.GA18104@localhost>
Message-ID: <CALuKHAeatXn4TkgpL-Qu8Uc+O41H8pwH+DTfNiK35wwnnytmAA@mail.gmail.com>

On 10 January 2014 23:16, Mahesh T. Pai <paivakil at gmail.com> wrote:

> pravin.d.s at gmail.com said on Fri, Jan 10, 2014 at 03:45:00PM +0530,:
>     - Lohit 1 is supporting sequence (A) from long time (even before
>  > Unicode 5.1), so for the backward compatibility lohit2 should support
> the
>  > same.
>  >
>
> I believe thet the UTC wanted to maintain compatibility with some
> _beta_ version of Microsoft's some software in making the choice that
> it did regarding the /nta/ sequence.
>
>
>  >     Presently i am in favour of not supporting Unicode defined
>  > sequence (B) in lohit2 and keep on using (A) which is used in Lohit
>  > fonts family from long time.
>
> Allow me to go on a nostalgia trip. Almost a decade back, the then SMC
> team came accross what was obvious lack of clarity in the UTS. They
> decided, against my advise, to follow the suggestions in OpenType
> definition. To be fair, then, I had no alternative to offer, except
> not to implement the suggestion in the OpenType pages. Microsoft
> ultimately waited for some clarity in the UTS before implementing
> anything. and the communimity efforts went (mostly) in vain.
>

I was wondering how ISCII was handling this.


>
> Right now, given a choice between supporting legacy data and
> standards, I will choose the latter, with some kind of jugaad based on
> the PUA / glyph name to enable support for legacy data.
>

Yeah, as said above will support both legacy and standard sequence.


>
> Not the ideal situation, but when politics get the uppoer hand over
> merits, efficiency and appropriateness always takes a backseat.
>

That is pain point of standardization activities.

Thanks & Regards,
Pravin Satpute
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140113/1b958702/attachment.html>

From cibucj at gmail.com  Mon Jan 13 00:32:16 2014
From: cibucj at gmail.com (=?UTF-8?B?4LS44LS/4LSs4LWBIOC0uOC0vyDgtJzgtYY=?=)
Date: Sun, 12 Jan 2014 22:32:16 -0800
Subject: [Lohit-devel-list] Handling Malayalam "NTA" issue for Lohit2
In-Reply-To: <CALuKHAcGNpqYLdurbWnnqiS+5=2APinPbKWS4Hs+Lt994qgyeQ@mail.gmail.com>
References: <CALuKHAeWWYTv4NkH-1mgzA0J3qHBVmxyOCdKvFwu3P20swWs+Q@mail.gmail.com>
 <CAH-HCWVX3AjnR9=qR6ai02sYh7TWVHLGxKvVYuKPDitP7UJqcw@mail.gmail.com>
 <CALuKHAcGNpqYLdurbWnnqiS+5=2APinPbKWS4Hs+Lt994qgyeQ@mail.gmail.com>
Message-ID: <CAD8TiP4SejuB_Cqs9-XPW5ddN_KiYkGjn8=48+KTggTXLY-n7g@mail.gmail.com>

In fact, there is one more sequence to consider. Kartika in Windows follows
<NA, VIRAMA, ZWJ, RRA> for NTA. However, the existing data in that sequence
is quite less.

In case, Chillus standard is asking display software to be prepared for
data in both sequences. I agree, it could document NTA's legacy Vs standard
sequences, likewise.


2014/1/12 pravin.d.s at gmail.com <pravin.d.s at gmail.com>

>
>
>
> On 10 January 2014 17:54, Shriramana Sharma <samjnaa at gmail.com> wrote:
>
>> On Fri, Jan 10, 2014 at 3:45 PM, pravin.d.s at gmail.com
>> <pravin.d.s at gmail.com> wrote:
>> >     In my humble opinion here one thing is very clear that Unicode
>> forgot to
>> > add normalization (backward compatibility) for newly added sequence in
>> (B).
>>
>> Dear Pravin,
>>
>> If by normalization you mean
>> http://www.unicode.org/glossary/#normalization -- then it is not
>> possible in this case since the individually encoded chillus do not
>> have canonical decomposition to their related consonants. Indeed, that
>> would defeat the purpose of the separate encoding, which was to
>> provide semantically distinct chillus!
>>
>
> Ok not normalization but at least Unicode should mention old habit of
> writing NTA and new with addition of atomic chillu. It will definitely help
> people working on NLP to handle data having these two different sequence.
>
>
>>
>> On a more serious note, I think it is important to adhere to the
>> standard, as it is good for you in the long run even though it is
>> difficult at first. If you delay the adoption of the standard, it only
>> gets all the harder as time passes, since in the interim even more
>> people continue to assume the old behaviour...
>>
>
> From font perspective if we consider there is NTA sequence is available in
> both form (A) & (B) in data around. We have to add required rules for both
> way. Unfortunately in this case Unicode has not consider for backward
> compatibility but at least Lohit project definitely consider it.
>
> So to be in safer side now i am fever of having both rules in font.
>
> Regards,
> Pravin Satpute
>
>
>
> _______________________________________________
> Indic mailing list
> Indic at unicode.org
> http://unicode.org/mailman/listinfo/indic
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140112/2197a4d8/attachment.html>

From infofarmer at gmail.com  Tue Jan 14 18:06:23 2014
From: infofarmer at gmail.com (Andrew Pantyukhin)
Date: Wed, 15 Jan 2014 04:06:23 +0400
Subject: CJK IDS database
Message-ID: <CA+qFSQ4w-0GESsNznsciFT-mx+jpTWH0E27MCzsCF3d7VG8wkA@mail.gmail.com>

Hi!

I find Ideographic Description Sequences massively useful for studying and
describing Chinese characters. However, I found only one comprehensive
source of them ? http://macchiato.com/ids/

Does anyone know where the files come from? Were they part of the IRG
process, or just an isolated effort? What are the private use characters in
the sequences?

I'd like to contribute to the IDS database and incorporate it into products
like wiktionary and rikaikun.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140115/fbf1d488/attachment.html>

From michel at suignard.com  Tue Jan 14 21:36:07 2014
From: michel at suignard.com (Michel Suignard)
Date: Wed, 15 Jan 2014 03:36:07 +0000
Subject: CJK IDS database
In-Reply-To: <CA+qFSQ4w-0GESsNznsciFT-mx+jpTWH0E27MCzsCF3d7VG8wkA@mail.gmail.com>
References: <CA+qFSQ4w-0GESsNznsciFT-mx+jpTWH0E27MCzsCF3d7VG8wkA@mail.gmail.com>
Message-ID: <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com>

I guess you should ask the owner, our distinguished president.
Michel

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Andrew Pantyukhin
Sent: Tuesday, January 14, 2014 4:06 PM
To: unicode at unicode.org
Subject: CJK IDS database

Hi!

I find Ideographic Description Sequences massively useful for studying and describing Chinese characters. However, I found only one comprehensive source of them ? http://macchiato.com/ids/

Does anyone know where the files come from? Were they part of the IRG process, or just an isolated effort? What are the private use characters in the sequences?

I'd like to contribute to the IDS database and incorporate it into products like wiktionary and rikaikun.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140115/04e3bb8d/attachment.html>

From mark at macchiato.com  Tue Jan 14 23:53:51 2014
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJU=?=)
Date: Wed, 15 Jan 2014 06:53:51 +0100
Subject: CJK IDS database
In-Reply-To: <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com>
References: <CA+qFSQ4w-0GESsNznsciFT-mx+jpTWH0E27MCzsCF3d7VG8wkA@mail.gmail.com>
 <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com>
Message-ID: <CAJ2xs_H3pQ3DBKHS+za1G+kAex2bwSNSTbmk8RydxTKZdk5qow@mail.gmail.com>

Boy, I'd forgotten about those. There is an open-source collection of IDSs
that I used to create those files. Unfortunately, I found that *that* data
would take a lot of cleanup.

I do agree that it would be very useful to have an open-source repository
of IDSs for Unicode characters, but I don't know of one. Others?


Mark <https://google.com/+MarkDavis>

*? Il meglio ? l?inimico del bene ?*


On Wed, Jan 15, 2014 at 4:36 AM, Michel Suignard <michel at suignard.com>wrote:

>  I guess you should ask the owner, our distinguished president.
>
> Michel
>
>
>
> *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Andrew
> Pantyukhin
> *Sent:* Tuesday, January 14, 2014 4:06 PM
> *To:* unicode at unicode.org
> *Subject:* CJK IDS database
>
>
>
> Hi!
>
> I find Ideographic Description Sequences massively useful for studying and
> describing Chinese characters. However, I found only one comprehensive
> source of them ? http://macchiato.com/ids/
>
>
> Does anyone know where the files come from? Were they part of the IRG
> process, or just an isolated effort? What are the private use characters in
> the sequences?
>
> I'd like to contribute to the IDS database and incorporate it into
> products like wiktionary and rikaikun.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140115/6bdf1121/attachment.html>

From mpsuzuki at hiroshima-u.ac.jp  Wed Jan 15 00:10:54 2014
From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya)
Date: Wed, 15 Jan 2014 15:10:54 +0900
Subject: ["Unicode"] Re: CJK IDS database
In-Reply-To: <CAJ2xs_H3pQ3DBKHS+za1G+kAex2bwSNSTbmk8RydxTKZdk5qow@mail.gmail.com>
References: <CA+qFSQ4w-0GESsNznsciFT-mx+jpTWH0E27MCzsCF3d7VG8wkA@mail.gmail.com>
 <18d0f28f79804234b9d301aa40f4bf32@CO1PR02MB157.namprd02.prod.outlook.com>
 <CAJ2xs_H3pQ3DBKHS+za1G+kAex2bwSNSTbmk8RydxTKZdk5qow@mail.gmail.com>
Message-ID: <52D6266E.6090104@hiroshima-u.ac.jp>

Hi,

The query of the latest IDS collection is periodical issue
in Unihan mailing list, I think :-) The repository maintained
by Kawabata (technical editor of IRG Working Document Set)
is now located at: https://github.com/cjkvi

# the users should be careful the location of the
# repository is stablized. It is often changed (without
# notice of new place to go), don't be afraid and ask
# experts where to go.

Kawabata-san's work is based on CHISE database, which is
available at: http://git.chise.org/gitweb/?p=chise/ids.git

Regards,
mpsuzuki

Mark Davis ? wrote:
> Boy, I'd forgotten about those. There is an open-source collection of IDSs
> that I used to create those files. Unfortunately, I found that *that* data
> would take a lot of cleanup.
> 
> I do agree that it would be very useful to have an open-source repository
> of IDSs for Unicode characters, but I don't know of one. Others?
> 
> 
> Mark <https://google.com/+MarkDavis>
> 
> *? Il meglio ? l?inimico del bene ?*
> 
> 
> On Wed, Jan 15, 2014 at 4:36 AM, Michel Suignard <michel at suignard.com>wrote:
> 
>>  I guess you should ask the owner, our distinguished president.
>>
>> Michel
>>
>>
>>
>> *From:* Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of *Andrew
>> Pantyukhin
>> *Sent:* Tuesday, January 14, 2014 4:06 PM
>> *To:* unicode at unicode.org
>> *Subject:* CJK IDS database
>>
>>
>>
>> Hi!
>>
>> I find Ideographic Description Sequences massively useful for studying and
>> describing Chinese characters. However, I found only one comprehensive
>> source of them ? http://macchiato.com/ids/
>>
>>
>> Does anyone know where the files come from? Were they part of the IRG
>> process, or just an isolated effort? What are the private use characters in
>> the sequences?
>>
>> I'd like to contribute to the IDS database and incorporate it into
>> products like wiktionary and rikaikun.
>>
>>
>>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode


From xn--mlform-iua at xn--mlform-iua.no  Wed Jan 15 21:43:05 2014
From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli)
Date: Thu, 16 Jan 2014 04:43:05 +0100
Subject: Commercial minus as italic variant of division sign in German and
 Scandinavian context
Message-ID: <20140116044305293116.f28ead07@xn--mlform-iua.no>

Thanks to our discussion in July 2012,[1] the Unicode code charts now 
says, about 00F7 ? DIVISION SIGN, this:

  ?? occasionally used as an alternate, more visually
     distinct version of 2212 ? {MINUS SIGN} or 2011 ? 
     {NON-BREAKING HYPHEN} in some contexts  
        [? snip ?]
   ? 2052 ? commercial minus sign?

However, I think it can also be added somewhere that commercial minus 
is just the italic variant of ?division minus?. I?ll hereby argue for 
this based on an old German book on ?commercial arithmetics? I have 
come accross, plus what the the July 2012 discussion and what Unicode 
already says about the commercial sign:

FIRST: IDENTICAL CONTEXTS. 

   German language is an important locale for the Commercial Minus. In 
German, the Commercial minus is both referred to as ?kaufm?nnische 
Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus 
Character? and ?Bookkeeper Minus?). And, speaking of ?division minus? 
in the context I know best, Norway, we find it in advertising 
(commercial context) and in book keeping documentation and taxation 
forms. Simply put, what the Unicode 6.2 ?General Punctuation? section 
says about Commercial Minus, can also be said about DIVISION SIGN used 
as minus: ?U+2052 % commercial minus sign is used in commercial or tax 
related forms or publications in several European countries, including 
Germany and Scandinavia.? So, basically and for the most part, the 
commercial minus and the ?division sign minus? occur in the very same 
contexts, with very much the same meaning. This is a strong hint that 
they are the same character.

SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT.

   Is there any proof that German used both an italics variant and a 
non-italics variant of the ?division minus?? Seemingly yes. The book 
?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by 
Johann Philipp Schellenberg. By reading section 118 ?Anhang zur 
Addition und Subtraction der Br?che? [?Appendix about the addition and 
subtraction of fractions?]) at page 213 and onwards,[2] we can conclude 
that he describes as ?commercial? use of the ? ?division minus?, where 
the ? signifies a _negative remainder_ of a division (while the plus 
sign is used to signify a positive remainder). Or to quote, from page 
214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und 
bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking 
remainder is marked with the ? (minus) and withdrawn when the price of 
the commodity is calculated?]. {Note that some bits of the text are 
lacking, I marked my guessed in square brackets.} I did not find (yet) 
that he used the italic commercial minus, however, the context is 
correct. (My guess is that the italics variant has been put to more 
use, in the computer age, partly to separate it from the DIVISION SIGN 
or may be simply because people started to see it often in handwriting 
but seldom in print. And so would not have recognized it in the form of 
the non-italic division sign.)

THIRD: IDENTICAL INTERPRETATION

The word ?abgezogen? in the above quote is interesting since the Code 
Charts for 2052 ? COMMERCIAL MINUS cites the related German word 
?abz?glich?. And from the Swedish context, the charts quotes the 
expression ?med avdrag?. English translation might be ?to be withdrawn? 
or ?with subtraction/rebate [for]?. Simply put, we here see the 
commercial meaning.

WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN  IN SCANDINAVIAN SCHOOLS?

UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and 
perhaps Norwegian) traditions, teachers use the Commercial Minus Sign 
to signify that something is correct (whereas a red check mark is used 
to signify error). If my theory is right, that commercial minus and 
division sign minus are the same signs, how on earth is that possible? 
How can a minus sign count as positive for the student?

The answer is, I think, to be found in the Code Chart?s Swedish 
description ("med avdrag"/"with subtraction/rebate"). Because, I think 
that the correct understanding is not that it means "correct" or "OK". 
Rather, it denotes something that is counted in the customer/student?s 
favor. So, you could say it it really means "slack", or "rebate".  So 
it really mans ?good answer?. It is a ?rebate? that the student 
rightfully deserves.

FOURTH: A DEEPER MEANING

If we look at it from a very high level, then we can say that the 
division minus is used to signify something that is the result of a 
calculation - such as a price, an entry in bookkeeping or, indeed, a 
character/mark/point/score in a (home)work evaluated by a teacher. 
Whereas the ?normal? minus sign is used to when we represent negative 
data. For example, in taxation, all the numbers one reports, is the 
result of some calculation. Likewise, when a teach ticks of an answer 
as ?good answer?, then it is because the teacher has evaluated (a.k.a. 
?calculated?) the answer and found it to be good and that the student 
has calculated correctly/well.

CIRCUMSTANCIAL EVIDENCE

The commercial minus looks like a percentage sign. And also, in 
programming, e.g. JavaScript, the percentage sign is often used for the 
modulo operator - which is an operator that finds the dividend of a 
division.

Hence, when we take all this together, I believe we have to conclude 
that the COMMERCIAL MINUS is just the italic variant of the DIVISION 
SIGN.

PS: For more German documentation of this custom, it would probably be 
wise to research books about bookkeeping as well as ?commercial 
arithmetics?. I also have a suspicion that it would be worth 
investigation contexts where modulo/division remainders operations are 
found - for instance, in calendar calculations.

[1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html
[2] 
https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up
-- 
leif halvard silli


From asmusf at ix.netcom.com  Thu Jan 16 01:17:46 2014
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Wed, 15 Jan 2014 23:17:46 -0800
Subject: Commercial minus as italic variant of division sign in German
 and Scandinavian context
In-Reply-To: <20140116044305293116.f28ead07@xn--mlform-iua.no>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
Message-ID: <52D7879A.70103@ix.netcom.com>

I find it unhelpful to consider 2052 as the italic variant of 00F7, and
further find the "evidence" for that not all that germane.

Both are variants of the "-" sign, and so ipso facto are variants of
each other.

However, to identify something as "italic" to me would require that
one form is used in the context of italic fonts, while the other is not.

I cannot see anything supporting that interpretation in the "evidence"
adduced below.

On the contrary, you would expect both forms available in sans-serif
and typewriter fonts (those being perhaps the most common for
accounting), and perhaps also roman.

Further, while italic (as well as oblique fonts) tend to slant the letter
forms, there's not a universal, established practice of turning horizontal
dashes into slashes to mark the alternation between roman and
italic fonts. From that perspective, considering one the "italic"
variant of the other also appears to be a non-starter.

However, it seems to be possible to establish that these two
characters are indeed rather close variants: both are used
to visually emphasize the minus sign by means of decorating
it with a pair of dots. And both are employed in situations that
are have a large semantic overlap. (Not surprisingly, because their
meaning is based on the minus sign).

The choice of variant, though, is driven by context and tradition
for a given type of document, not by choice of font style.
And, the choice of using 2052 instead of hyphen-minus or minus
is deliberate and conscious, making it an alternate spelling rather
than an alternate "glyph".

If 00F7 can be used to stand in as a marked 2011, as claimed in
the Unicode namelist annotation then that use is clearly NOT
as a variant of 2052, because 2011 does not have
any connotations of negation. That means the semantic
relations between 00F7 and 2052 only partially overlap, which
is yet another indication that thinking of one as a font-style
variant of the other is not particularly helpful - even if the
ultimate origin may have derived from the same sign.

At this stage of the game, they are properly disunified,
just as i and j or u and v.

A./


On 1/15/2014 7:43 PM, Leif Halvard Silli wrote:
> Thanks to our discussion in July 2012,[1] the Unicode code charts now
> says, about 00F7 ? DIVISION SIGN, this:
>
>    ?? occasionally used as an alternate, more visually
>       distinct version of 2212 ? {MINUS SIGN} or 2011 ?
>       {NON-BREAKING HYPHEN} in some contexts
>          [? snip ?]
>     ? 2052 ? commercial minus sign?
>
> However, I think it can also be added somewhere that commercial minus
> is just the italic variant of ?division minus?. I?ll hereby argue for
> this based on an old German book on ?commercial arithmetics? I have
> come accross, plus what the the July 2012 discussion and what Unicode
> already says about the commercial sign:
>
> FIRST: IDENTICAL CONTEXTS.
>
>     German language is an important locale for the Commercial Minus. In
> German, the Commercial minus is both referred to as ?kaufm?nnische
> Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus
> Character? and ?Bookkeeper Minus?). And, speaking of ?division minus?
> in the context I know best, Norway, we find it in advertising
> (commercial context) and in book keeping documentation and taxation
> forms. Simply put, what the Unicode 6.2 ?General Punctuation? section
> says about Commercial Minus, can also be said about DIVISION SIGN used
> as minus: ?U+2052 % commercial minus sign is used in commercial or tax
> related forms or publications in several European countries, including
> Germany and Scandinavia.? So, basically and for the most part, the
> commercial minus and the ?division sign minus? occur in the very same
> contexts, with very much the same meaning. This is a strong hint that
> they are the same character.
>
> SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT.
>
>     Is there any proof that German used both an italics variant and a
> non-italics variant of the ?division minus?? Seemingly yes. The book
> ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by
> Johann Philipp Schellenberg. By reading section 118 ?Anhang zur
> Addition und Subtraction der Br?che? [?Appendix about the addition and
> subtraction of fractions?]) at page 213 and onwards,[2] we can conclude
> that he describes as ?commercial? use of the ? ?division minus?, where
> the ? signifies a _negative remainder_ of a division (while the plus
> sign is used to signify a positive remainder). Or to quote, from page
> 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und
> bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking
> remainder is marked with the ? (minus) and withdrawn when the price of
> the commodity is calculated?]. {Note that some bits of the text are
> lacking, I marked my guessed in square brackets.} I did not find (yet)
> that he used the italic commercial minus, however, the context is
> correct. (My guess is that the italics variant has been put to more
> use, in the computer age, partly to separate it from the DIVISION SIGN
> or may be simply because people started to see it often in handwriting
> but seldom in print. And so would not have recognized it in the form of
> the non-italic division sign.)
>
> THIRD: IDENTICAL INTERPRETATION
>
> The word ?abgezogen? in the above quote is interesting since the Code
> Charts for 2052 ? COMMERCIAL MINUS cites the related German word
> ?abz?glich?. And from the Swedish context, the charts quotes the
> expression ?med avdrag?. English translation might be ?to be withdrawn?
> or ?with subtraction/rebate [for]?. Simply put, we here see the
> commercial meaning.
>
> WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN  IN SCANDINAVIAN SCHOOLS?
>
> UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and
> perhaps Norwegian) traditions, teachers use the Commercial Minus Sign
> to signify that something is correct (whereas a red check mark is used
> to signify error). If my theory is right, that commercial minus and
> division sign minus are the same signs, how on earth is that possible?
> How can a minus sign count as positive for the student?
>
> The answer is, I think, to be found in the Code Chart?s Swedish
> description ("med avdrag"/"with subtraction/rebate"). Because, I think
> that the correct understanding is not that it means "correct" or "OK".
> Rather, it denotes something that is counted in the customer/student?s
> favor. So, you could say it it really means "slack", or "rebate".  So
> it really mans ?good answer?. It is a ?rebate? that the student
> rightfully deserves.
>
> FOURTH: A DEEPER MEANING
>
> If we look at it from a very high level, then we can say that the
> division minus is used to signify something that is the result of a
> calculation - such as a price, an entry in bookkeeping or, indeed, a
> character/mark/point/score in a (home)work evaluated by a teacher.
> Whereas the ?normal? minus sign is used to when we represent negative
> data. For example, in taxation, all the numbers one reports, is the
> result of some calculation. Likewise, when a teach ticks of an answer
> as ?good answer?, then it is because the teacher has evaluated (a.k.a.
> ?calculated?) the answer and found it to be good and that the student
> has calculated correctly/well.
>
> CIRCUMSTANCIAL EVIDENCE
>
> The commercial minus looks like a percentage sign. And also, in
> programming, e.g. JavaScript, the percentage sign is often used for the
> modulo operator - which is an operator that finds the dividend of a
> division.
>
> Hence, when we take all this together, I believe we have to conclude
> that the COMMERCIAL MINUS is just the italic variant of the DIVISION
> SIGN.
>
> PS: For more German documentation of this custom, it would probably be
> wise to research books about bookkeeping as well as ?commercial
> arithmetics?. I also have a suspicion that it would be worth
> investigation contexts where modulo/division remainders operations are
> found - for instance, in calendar calculations.
>
> [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html
> [2]
> https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up


From jknappen at web.de  Thu Jan 16 02:26:10 2014
From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=)
Date: Thu, 16 Jan 2014 09:26:10 +0100 (CET)
Subject: Aw: Commercial minus as italic variant of division sign in German
 and Scandinavian context
In-Reply-To: <20140116044305293116.f28ead07@xn--mlform-iua.no>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
Message-ID: <trinity-2374a116-26e9-4c7d-8af3-0f641b7d73e3-1389860770620@3capp-webde-bs10>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140116/53e17b21/attachment.html>

From xn--mlform-iua at xn--mlform-iua.no  Thu Jan 16 07:34:23 2014
From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli)
Date: Thu, 16 Jan 2014 14:34:23 +0100
Subject: Commercial minus as italic variant of division sign in German
 and Scandinavian context
In-Reply-To: <52D7879A.70103@ix.netcom.com>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
 <52D7879A.70103@ix.netcom.com>
Message-ID: <20140116143423686172.a3f32e12@xn--mlform-iua.no>

Asmus, 

I am not certain that commercial minus isn?t sometimes used as italics 
for the ?division sign minus?. For instance, when looking at my message 
in Firefox [1], the commercial minus looks like a ?handwritten? variant 
of the division sign. I think it would be entirely possible to use a 
that way looking commercial minus in a Norwegian taxation formulary, 
for instance. (I attach a screenshot of it.) I suspect that it is a 
monospace Courier font. 

Also, I wonder about the claim in the General Punctuation section that 
commercial minus is used in taxation forms in Scandinavia and Germany. 
I would dearly like to see the evidence for that claim. I must say that 
I suspect that the use of the division sign in Norwegian taxation forms 
for this purpose have been counted in a s evidence for that claim - 
could it be that our ?straight commercial minus? was counted as, well, 
a commercial minus? Could it be that the wish to see oneself - or us - 
in the ?German tradition?, made one draw the wrong conclusion about 
which character we use?

Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in 
the, kind of, ?mathematical? sense: Unicode for instance contains both 
MATHEMATICAL BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A, 
and even if they are (I believe) used for different mathematical 
purposes, everyone sees and knows that they are variants of one and the 
same letter - the capital A. And also, in some contexts, one might be 
able to use a normal capital A instead of the mathematical ones.

The same knowledge is not present about 00F7 and 2052. The best would 
have been if the two characters shared a similar name. For instance, if 
00F7 got an additional, synonymous name, like STRAIGHT COMMERCIAL 
MINUS, or perhaps, better, COMMERCIAL HYPHEN-MINUS. Then the 
relationship would be clear - or at least clearer. Like MATHEMATICAL 
BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A show, two 
characters do not need to be 100% synonymous just because their names 
only differs in a stylistic way, so to speak.

When reading Unicode, one is only left to guess about the relationship 
between 00F7 and 2052. For instance, 2052 is described in the general 
punctuation - and distinguished there from the ?normal? minus and 
hyphen-minus, whereas 0057 is not described there. A sentence, there, 
that said that, in some countries, it is actually the 00F7 and not the 
2052, that is used, would be very helpful and enlightenting.  Likewise, 
there is no description of 00F7 amongst the dashes/hyphens.

You wrote:

> Further, while italic (as well as oblique fonts) tend to slant the letter
> forms, there's not a universal, established practice of turning horizontal
> dashes into slashes to mark the alternation between roman and
> italic fonts. From that perspective, considering one the "italic"
> variant of the other also appears to be a non-starter.

Right. And I can only underline once more that I meant ?italic? as part 
of the name, see above.

You:

> However, it seems to be possible to establish that these two
> characters are indeed rather close variants: [?]

Indeed.

> The choice of variant, though, is driven by context and tradition
> for a given type of document, not by choice of font style.
> And, the choice of using 2052 instead of hyphen-minus or minus
> is deliberate and conscious, making it an alternate spelling rather
> than an alternate "glyph".

Well, yes.

> If 00F7 can be used to stand in as a marked 2011, as claimed in
> the Unicode namelist annotation then that use is clearly NOT
> as a variant of 2052, because 2011 does not have
> any connotations of negation.

It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no?

> That means the semantic
> relations between 00F7 and 2052 only partially overlap, which
> is yet another indication that thinking of one as a font-style
> variant of the other is not particularly helpful - even if the
> ultimate origin may have derived from the same sign.
> 
> At this stage of the game, they are properly disunified,
> just as i and j or u and v.

I am not really arguing for their unification - which anyhow is 
impossible, if I have understood the stability rules of Unicode. 
(Whereas an *additional* name is not ruled out, if I got it right.) I 
am ?only? arguing that Unicode takes information that clearly links the 
two together. As it is today, no one seems to realize how commercial 
minus relates to ?division sign minus?.


[1] http://unicode.org/pipermail/unicode/2014-January/000013.html
[2] attachment of the file ?screenshot-of-minuses.png" 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screenshot-of-minuses.png
Type: image/png
Size: 9512 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20140116/a495cb4b/attachment.png>
-------------- next part --------------


Leif Halvard Silli

Asmus Freytag, Wed, 15 Jan 2014 23:17:46 -0800:
> I find it unhelpful to consider 2052 as the italic variant of 00F7, and
> further find the "evidence" for that not all that germane.
> 
> Both are variants of the "-" sign, and so ipso facto are variants of
> each other.
> 
> However, to identify something as "italic" to me would require that
> one form is used in the context of italic fonts, while the other is not.
> 
> I cannot see anything supporting that interpretation in the "evidence"
> adduced below.
> 
> On the contrary, you would expect both forms available in sans-serif
> and typewriter fonts (those being perhaps the most common for
> accounting), and perhaps also roman.
> 
> Further, while italic (as well as oblique fonts) tend to slant the letter
> forms, there's not a universal, established practice of turning horizontal
> dashes into slashes to mark the alternation between roman and
> italic fonts. From that perspective, considering one the "italic"
> variant of the other also appears to be a non-starter.
> 
> However, it seems to be possible to establish that these two
> characters are indeed rather close variants: both are used
> to visually emphasize the minus sign by means of decorating
> it with a pair of dots. And both are employed in situations that
> are have a large semantic overlap. (Not surprisingly, because their
> meaning is based on the minus sign).
> 
> The choice of variant, though, is driven by context and tradition
> for a given type of document, not by choice of font style.
> And, the choice of using 2052 instead of hyphen-minus or minus
> is deliberate and conscious, making it an alternate spelling rather
> than an alternate "glyph".
> 
> If 00F7 can be used to stand in as a marked 2011, as claimed in
> the Unicode namelist annotation then that use is clearly NOT
> as a variant of 2052, because 2011 does not have
> any connotations of negation. That means the semantic
> relations between 00F7 and 2052 only partially overlap, which
> is yet another indication that thinking of one as a font-style
> variant of the other is not particularly helpful - even if the
> ultimate origin may have derived from the same sign.
> 
> At this stage of the game, they are properly disunified,
> just as i and j or u and v.
> 
> A./
> 
> 
> 
> 
> On 1/15/2014 7:43 PM, Leif Halvard Silli wrote:
>> Thanks to our discussion in July 2012,[1] the Unicode code charts now
>> says, about 00F7 ? DIVISION SIGN, this:
>> 
>>    ?? occasionally used as an alternate, more visually
>>       distinct version of 2212 ? {MINUS SIGN} or 2011 ?
>>       {NON-BREAKING HYPHEN} in some contexts
>>          [? snip ?]
>>     ? 2052 ? commercial minus sign?
>> 
>> However, I think it can also be added somewhere that commercial minus
>> is just the italic variant of ?division minus?. I?ll hereby argue for
>> this based on an old German book on ?commercial arithmetics? I have
>> come accross, plus what the the July 2012 discussion and what Unicode
>> already says about the commercial sign:
>> 
>> FIRST: IDENTICAL CONTEXTS.
>> 
>>     German language is an important locale for the Commercial Minus. In
>> German, the Commercial minus is both referred to as ?kaufm?nnische
>> Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus
>> Character? and ?Bookkeeper Minus?). And, speaking of ?division minus?
>> in the context I know best, Norway, we find it in advertising
>> (commercial context) and in book keeping documentation and taxation
>> forms. Simply put, what the Unicode 6.2 ?General Punctuation? section
>> says about Commercial Minus, can also be said about DIVISION SIGN used
>> as minus: ?U+2052 % commercial minus sign is used in commercial or tax
>> related forms or publications in several European countries, including
>> Germany and Scandinavia.? So, basically and for the most part, the
>> commercial minus and the ?division sign minus? occur in the very same
>> contexts, with very much the same meaning. This is a strong hint that
>> they are the same character.
>> 
>> SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT.
>> 
>>     Is there any proof that German used both an italics variant and a
>> non-italics variant of the ?division minus?? Seemingly yes. The book
>> ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by
>> Johann Philipp Schellenberg. By reading section 118 ?Anhang zur
>> Addition und Subtraction der Br?che? [?Appendix about the addition and
>> subtraction of fractions?]) at page 213 and onwards,[2] we can conclude
>> that he describes as ?commercial? use of the ? ?division minus?, where
>> the ? signifies a _negative remainder_ of a division (while the plus
>> sign is used to signify a positive remainder). Or to quote, from page
>> 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und
>> bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking
>> remainder is marked with the ? (minus) and withdrawn when the price of
>> the commodity is calculated?]. {Note that some bits of the text are
>> lacking, I marked my guessed in square brackets.} I did not find (yet)
>> that he used the italic commercial minus, however, the context is
>> correct. (My guess is that the italics variant has been put to more
>> use, in the computer age, partly to separate it from the DIVISION SIGN
>> or may be simply because people started to see it often in handwriting
>> but seldom in print. And so would not have recognized it in the form of
>> the non-italic division sign.)
>> 
>> THIRD: IDENTICAL INTERPRETATION
>> 
>> The word ?abgezogen? in the above quote is interesting since the Code
>> Charts for 2052 ? COMMERCIAL MINUS cites the related German word
>> ?abz?glich?. And from the Swedish context, the charts quotes the
>> expression ?med avdrag?. English translation might be ?to be withdrawn?
>> or ?with subtraction/rebate [for]?. Simply put, we here see the
>> commercial meaning.
>> 
>> WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN  IN SCANDINAVIAN SCHOOLS?
>> 
>> UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and
>> perhaps Norwegian) traditions, teachers use the Commercial Minus Sign
>> to signify that something is correct (whereas a red check mark is used
>> to signify error). If my theory is right, that commercial minus and
>> division sign minus are the same signs, how on earth is that possible?
>> How can a minus sign count as positive for the student?
>> 
>> The answer is, I think, to be found in the Code Chart?s Swedish
>> description ("med avdrag"/"with subtraction/rebate"). Because, I think
>> that the correct understanding is not that it means "correct" or "OK".
>> Rather, it denotes something that is counted in the customer/student?s
>> favor. So, you could say it it really means "slack", or "rebate".  So
>> it really mans ?good answer?. It is a ?rebate? that the student
>> rightfully deserves.
>> 
>> FOURTH: A DEEPER MEANING
>> 
>> If we look at it from a very high level, then we can say that the
>> division minus is used to signify something that is the result of a
>> calculation - such as a price, an entry in bookkeeping or, indeed, a
>> character/mark/point/score in a (home)work evaluated by a teacher.
>> Whereas the ?normal? minus sign is used to when we represent negative
>> data. For example, in taxation, all the numbers one reports, is the
>> result of some calculation. Likewise, when a teach ticks of an answer
>> as ?good answer?, then it is because the teacher has evaluated (a.k.a.
>> ?calculated?) the answer and found it to be good and that the student
>> has calculated correctly/well.
>> 
>> CIRCUMSTANCIAL EVIDENCE
>> 
>> The commercial minus looks like a percentage sign. And also, in
>> programming, e.g. JavaScript, the percentage sign is often used for the
>> modulo operator - which is an operator that finds the dividend of a
>> division.
>> 
>> Hence, when we take all this together, I believe we have to conclude
>> that the COMMERCIAL MINUS is just the italic variant of the DIVISION
>> SIGN.
>> 
>> PS: For more German documentation of this custom, it would probably be
>> wise to research books about bookkeeping as well as ?commercial
>> arithmetics?. I also have a suspicion that it would be worth
>> investigation contexts where modulo/division remainders operations are
>> found - for instance, in calendar calculations.
>> 
>> [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html
>> [2]
>> https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up
> 

From xn--mlform-iua at xn--mlform-iua.no  Thu Jan 16 07:54:55 2014
From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli)
Date: Thu, 16 Jan 2014 14:54:55 +0100
Subject: Aw: Commercial minus as italic variant of division sign in
 German and Scandinavian context
In-Reply-To: <trinity-2374a116-26e9-4c7d-8af3-0f641b7d73e3-1389860770620@3capp-webde-bs10>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
 <trinity-2374a116-26e9-4c7d-8af3-0f641b7d73e3-1389860770620@3capp-webde-bs10>
Message-ID: <20140116145455777875.bc3e6891@xn--mlform-iua.no>

"J?rg Knappen", Thu, 16 Jan 2014 09:26:10 +0100 (CET):
> The most important word in the comment on 00F7 ? DIVISION SIGN is 
> "occasionally".
> ?
> In fact, the occasions are such rare that you can live a whole life 
> in germany without encountering one of them.
> ?
> On the other hand, 00F7 ? DIVISION SIGN is used _frequently_ in 
> german schoolbooks to denote ...
> division (books aimed at professionals doing math prefer : (COLON) or 
> / (SLASH) for this purpose, but schoolbooks don't).

This sounds like Norway ...
 ?
> 2052 ? commercial minus sign _always_ means subtraction and it has 
> this shape (or the alternate shape ./.)
> in all contexts, roman or italic. It is not the italic version of 
> some other symbol.

So, I can only once more emphasize that when I said ?italics? I meant 
it the way Unicode already have many characters (primarily mathematical 
ones) which are distinguished, in name, only by a reference to the 
style of the letter. Hope this helps.

As for the clarity of 2052 ? commercial minus sign, no, you are wrong. 
While it is clear to you, in Germany, perhaps, at least in some 
Scandinavian school contexts, it has a different meaning, namely as a 
?well done? sign, from the teacher.

As for the Norwegian context, I guess we can say that the use of ? 
DIVISION SIGN as minus sing is more on the down than on the up. But it 
has its contexts (and just last week, I received an ad for glasses were 
it was used), and no one thinks about it. It is not an issue. When we 
get the taxation form on paper or in PDF form, the division minus is 
there, and everyone understands it correctly. (Knock on woods - *some* 
probably stumbles.) They don?t every realize what they see - it is 
knowledge that is unaccounted for. (For instance, until I took this up, 
Wikipedia made no mention of it.  Hah! Even Unicode 6.3 talks about the 
?commercial minus sign? in _Scandinavian_ taxation forms, without (is 
my claim) understanding that it talks about DIVISION SIGN. See my reply 
to Asmus.)

So what I don?t want is that the ?untraditional? uses of ? DIVISION 
SIGN are left in the dark as some strange traditions without any roots. 
Also, I don't want the commercial minus to live a life as if it is such 
a unique thing. Let us document things properly.

Leif Halvard Silli


> Gesendet:?Donnerstag, 16. Januar 2014 um 04:43 Uhr
> Von:?"Leif Halvard Silli" <xn--mlform-iua at m?lform.no>
> An:?unicode at unicode.org
> Betreff:?Commercial minus as italic variant of division sign in 
> German and Scandinavian context
> Thanks to our discussion in July 2012,[1] the Unicode code charts now
> says, about 00F7 ? DIVISION SIGN, this:
> 
> ?? occasionally used as an alternate, more visually
> distinct version of 2212 ? {MINUS SIGN} or 2011 ?
> {NON-BREAKING HYPHEN} in some contexts
> [? snip ?]
> ? 2052 ? commercial minus sign?
> 
> However, I think it can also be added somewhere that commercial minus
> is just the italic variant of ?division minus?. I?ll hereby argue for
> this based on an old German book on ?commercial arithmetics? I have
> come accross, plus what the the July 2012 discussion and what Unicode
> already says about the commercial sign:
> 
> FIRST: IDENTICAL CONTEXTS.
> 
> German language is an important locale for the Commercial Minus. In
> German, the Commercial minus is both referred to as ?kaufm?nnische
> Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus
> Character? and ?Bookkeeper Minus?). And, speaking of ?division minus?
> in the context I know best, Norway, we find it in advertising
> (commercial context) and in book keeping documentation and taxation
> forms. Simply put, what the Unicode 6.2 ?General Punctuation? section
> says about Commercial Minus, can also be said about DIVISION SIGN used
> as minus: ?U+2052 % commercial minus sign is used in commercial or tax
> related forms or publications in several European countries, including
> Germany and Scandinavia.? So, basically and for the most part, the
> commercial minus and the ?division sign minus? occur in the very same
> contexts, with very much the same meaning. This is a strong hint that
> they are the same character.
> 
> SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT.
> 
> Is there any proof that German used both an italics variant and a
> non-italics variant of the ?division minus?? Seemingly yes. The book
> ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by
> Johann Philipp Schellenberg. By reading section 118 ?Anhang zur
> Addition und Subtraction der Br?che? [?Appendix about the addition and
> subtraction of fractions?]) at page 213 and onwards,[2] we can conclude
> that he describes as ?commercial? use of the ? ?division minus?, where
> the ? signifies a _negative remainder_ of a division (while the plus
> sign is used to signify a positive remainder). Or to quote, from page
> 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und
> bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking
> remainder is marked with the ? (minus) and withdrawn when the price of
> the commodity is calculated?]. {Note that some bits of the text are
> lacking, I marked my guessed in square brackets.} I did not find (yet)
> that he used the italic commercial minus, however, the context is
> correct. (My guess is that the italics variant has been put to more
> use, in the computer age, partly to separate it from the DIVISION SIGN
> or may be simply because people started to see it often in handwriting
> but seldom in print. And so would not have recognized it in the form of
> the non-italic division sign.)
> 
> THIRD: IDENTICAL INTERPRETATION
> 
> The word ?abgezogen? in the above quote is interesting since the Code
> Charts for 2052 ? COMMERCIAL MINUS cites the related German word
> ?abz?glich?. And from the Swedish context, the charts quotes the
> expression ?med avdrag?. English translation might be ?to be withdrawn?
> or ?with subtraction/rebate [for]?. Simply put, we here see the
> commercial meaning.
> 
> WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN IN SCANDINAVIAN SCHOOLS?
> 
> UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and
> perhaps Norwegian) traditions, teachers use the Commercial Minus Sign
> to signify that something is correct (whereas a red check mark is used
> to signify error). If my theory is right, that commercial minus and
> division sign minus are the same signs, how on earth is that possible?
> How can a minus sign count as positive for the student?
> 
> The answer is, I think, to be found in the Code Chart?s Swedish
> description ("med avdrag"/"with subtraction/rebate"). Because, I think
> that the correct understanding is not that it means "correct" or "OK".
> Rather, it denotes something that is counted in the customer/student?s
> favor. So, you could say it it really means "slack", or "rebate". So
> it really mans ?good answer?. It is a ?rebate? that the student
> rightfully deserves.
> 
> FOURTH: A DEEPER MEANING
> 
> If we look at it from a very high level, then we can say that the
> division minus is used to signify something that is the result of a
> calculation - such as a price, an entry in bookkeeping or, indeed, a
> character/mark/point/score in a (home)work evaluated by a teacher.
> Whereas the ?normal? minus sign is used to when we represent negative
> data. For example, in taxation, all the numbers one reports, is the
> result of some calculation. Likewise, when a teach ticks of an answer
> as ?good answer?, then it is because the teacher has evaluated (a.k.a.
> ?calculated?) the answer and found it to be good and that the student
> has calculated correctly/well.
> 
> CIRCUMSTANCIAL EVIDENCE
> 
> The commercial minus looks like a percentage sign. And also, in
> programming, e.g. JavaScript, the percentage sign is often used for the
> modulo operator - which is an operator that finds the dividend of a
> division.
> 
> Hence, when we take all this together, I believe we have to conclude
> that the COMMERCIAL MINUS is just the italic variant of the DIVISION
> SIGN.
> 
> PS: For more German documentation of this custom, it would probably be
> wise to research books about bookkeeping as well as ?commercial
> arithmetics?. I also have a suspicion that it would be worth
> investigation contexts where modulo/division remainders operations are
> found - for instance, in calendar calculations.
> 
> [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html
> [2]
> https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up
> --
> leif halvard silli
> 
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode


From asmusf at ix.netcom.com  Thu Jan 16 09:24:45 2014
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Thu, 16 Jan 2014 07:24:45 -0800
Subject: Commercial minus as italic variant of division sign in German
 and Scandinavian context
In-Reply-To: <20140116143423686172.a3f32e12@xn--mlform-iua.no>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
 <52D7879A.70103@ix.netcom.com>
 <20140116143423686172.a3f32e12@xn--mlform-iua.no>
Message-ID: <52D7F9BD.8060106@ix.netcom.com>

On 1/16/2014 5:34 AM, Leif Halvard Silli wrote:
> Asmus,
>
> I am not certain that commercial minus isn?t sometimes used as italics
> for the ?division sign minus?. For instance, when looking at my message
> in Firefox [1], the commercial minus looks like a ?handwritten? variant
> of the division sign. I think it would be entirely possible to use a
> that way looking commercial minus in a Norwegian taxation formulary,
> for instance. (I attach a screenshot of it.) I suspect that it is a
> monospace Courier font.
The screen shot indeed shows a glyph for 2052 that superficially looks 
like a
*reverse* (!) oblique variant of the glyph for 00F7. I say 
"superficially" because
the other distinction is the use of heavier dots.

However, the fact that the "slant" is reverse, rather than forward, is 
contrary to
the way oblique or italic fonts usually work.

So, again, I find your suggestion of "italic variant" not helpful.
>
> Also, I wonder about the claim in the General Punctuation section that
> commercial minus is used in taxation forms in Scandinavia and Germany.
> I would dearly like to see the evidence for that claim. I must say that
> I suspect that the use of the division sign in Norwegian taxation forms
> for this purpose have been counted in a s evidence for that claim -
> could it be that our ?straight commercial minus? was counted as, well,
> a commercial minus? Could it be that the wish to see oneself - or us -
> in the ?German tradition?, made one draw the wrong conclusion about
> which character we use?

I would not be surprised if the actual situation is a bit more detailed 
than expressed
in Unicode's namelist annotations (or even the descriptions in the 
chapter texts).

However, I can't assist you in tracking those down as I have access to 
no taxation
forms that use any of these characters. :)
>
> Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in
> the, kind of, ?mathematical? sense: Unicode for instance contains both
> MATHEMATICAL BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A,
> and even if they are (I believe) used for different mathematical
> purposes, everyone sees and knows that they are variants of one and the
> same letter - the capital A. And also, in some contexts, one might be
> able to use a normal capital A instead of the mathematical ones.

This is getting even less helpful.

The mathematical alphabets exist, because in mathematics, you cannot 
substitute
one shape for another without destroying the semantics (and there are 
general
conventions about what shape to use where).

The latter is similar to the uses of 00F7 and 2052 both. There are 
conventions
where each of them is appropriate and these conventions depend on rathere
selected user communities (school books, tax forms, accounting, math), just
like the use of certain mathematical alphabet styles in physics may not be
shared in all mathematical disciplines.

Where the case for 00F7 and 2052 differs from the mathematical alphabets is
that in the latter case the shape variants are (to a very large extent) 
accurately
described by the typographical moniker. A bold is a bold.

The only exception that I can think of is in the realm of "script", 
where some
authors prefer a slightly different style that isn't tied to 18th 
century copperplate.
>
> The same knowledge is not present about 00F7 and 2052. The best would
> have been if the two characters shared a similar name. For instance, if
> 00F7 got an additional, synonymous name, like STRAIGHT COMMERCIAL
> MINUS, or perhaps, better, COMMERCIAL HYPHEN-MINUS. Then the
> relationship would be clear - or at least clearer. Like MATHEMATICAL
> BOLD ITALIC CAPITAL A, and MATHEMATICAL BOLD CAPITAL A show, two
> characters do not need to be 100% synonymous just because their names
> only differs in a stylistic way, so to speak.
Well, 00F7 is *most often* used as a division sign. Check calculator keys.
>
> When reading Unicode, one is only left to guess about the relationship
> between 00F7 and 2052. For instance, 2052 is described in the general
> punctuation - and distinguished there from the ?normal? minus and
> hyphen-minus, whereas 0057 is not described there. A sentence, there,
> that said that, in some countries, it is actually the 00F7 and not the
> 2052, that is used, would be very helpful and enlightenting.  Likewise,
> there is no description of 00F7 amongst the dashes/hyphens.

Suggest better text for the book chapter that details the precise places 
that have
been established as using 00F7 in the capacity of "minus sign". That 
would be
more helpful than trying to somehow treat 00F7 and 2052 as glyphic 
variants of each
other. They are separate characters, with distinct usage conventions 
that simply happen
to employ both a line and two dots. (The fallback of ./. for 2052 is 
interesting in this context).
>
> You wrote:
>
>> Further, while italic (as well as oblique fonts) tend to slant the letter
>> forms, there's not a universal, established practice of turning horizontal
>> dashes into slashes to mark the alternation between roman and
>> italic fonts. From that perspective, considering one the "italic"
>> variant of the other also appears to be a non-starter.
> Right. And I can only underline once more that I meant ?italic? as part
> of the name, see above.
Actually, as I wrote at the top, you'd need "reverse italic" and in 
general, trying to establish
this relation is a red herring. It does not improve the user experience.
>
> You:
>
>> However, it seems to be possible to establish that these two
>> characters are indeed rather close variants: [?]
> Indeed.
Less close than it appears, because when I wrote this I did not include
the notion of the most common use of 00F7, which is indeed for DIVISION.
I was focused only at the minority use of 00F7 as a minus sign, in which 
case
it and 2052 AND 002D and 2012 all function as variants of each other (but
not as glyphic variants --- they are spelling variants).
>
>> The choice of variant, though, is driven by context and tradition
>> for a given type of document, not by choice of font style.
>> And, the choice of using 2052 instead of hyphen-minus or minus
>> is deliberate and conscious, making it an alternate spelling rather
>> than an alternate "glyph".
> Well, yes.
Because it's spelling, the "italic" is a red herring.
>> If 00F7 can be used to stand in as a marked 2011, as claimed in
>> the Unicode namelist annotation then that use is clearly NOT
>> as a variant of 2052, because 2011 does not have
>> any connotations of negation.
> It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no?
Once you get into the dashes, there's tons of variant usage. What's 
documented in Unicode tends to be from predominantly English-language 
style manuals, but if you extend this to all publications in all 
(Western) languages including recent historic times, I'm sure you'd find 
surprising variations.

For quotation marks we ran this to earth and the story is truly complex.
>
>> That means the semantic
>> relations between 00F7 and 2052 only partially overlap, which
>> is yet another indication that thinking of one as a font-style
>> variant of the other is not particularly helpful - even if the
>> ultimate origin may have derived from the same sign.
>>
>> At this stage of the game, they are properly disunified,
>> just as i and j or u and v.
> I am not really arguing for their unification - which anyhow is
> impossible, if I have understood the stability rules of Unicode.
> (Whereas an *additional* name is not ruled out, if I got it right.) I
> am ?only? arguing that Unicode takes information that clearly links the
> two together. As it is today, no one seems to realize how commercial
> minus relates to ?division sign minus?.
"additional" names are ruled out - except to fix something that's badly 
broken.
Neither of these characters has names that are misleading, mistyped or both.

There are many characters with deep relations that many users do no know
about. And, in this case, there seem to be some issues with the precise
relation you are trying to implement.

A./
>
>
> [1] http://unicode.org/pipermail/unicode/2014-January/000013.html
> [2] attachment of the file ?screenshot-of-minuses.png"
>
>
>
> Leif Halvard Silli
>
> Asmus Freytag, Wed, 15 Jan 2014 23:17:46 -0800:
>> I find it unhelpful to consider 2052 as the italic variant of 00F7, and
>> further find the "evidence" for that not all that germane.
>>
>> Both are variants of the "-" sign, and so ipso facto are variants of
>> each other.
>>
>> However, to identify something as "italic" to me would require that
>> one form is used in the context of italic fonts, while the other is not.
>>
>> I cannot see anything supporting that interpretation in the "evidence"
>> adduced below.
>>
>> On the contrary, you would expect both forms available in sans-serif
>> and typewriter fonts (those being perhaps the most common for
>> accounting), and perhaps also roman.
>>
>> Further, while italic (as well as oblique fonts) tend to slant the letter
>> forms, there's not a universal, established practice of turning horizontal
>> dashes into slashes to mark the alternation between roman and
>> italic fonts. From that perspective, considering one the "italic"
>> variant of the other also appears to be a non-starter.
>>
>> However, it seems to be possible to establish that these two
>> characters are indeed rather close variants: both are used
>> to visually emphasize the minus sign by means of decorating
>> it with a pair of dots. And both are employed in situations that
>> are have a large semantic overlap. (Not surprisingly, because their
>> meaning is based on the minus sign).
>>
>> The choice of variant, though, is driven by context and tradition
>> for a given type of document, not by choice of font style.
>> And, the choice of using 2052 instead of hyphen-minus or minus
>> is deliberate and conscious, making it an alternate spelling rather
>> than an alternate "glyph".
>>
>> If 00F7 can be used to stand in as a marked 2011, as claimed in
>> the Unicode namelist annotation then that use is clearly NOT
>> as a variant of 2052, because 2011 does not have
>> any connotations of negation. That means the semantic
>> relations between 00F7 and 2052 only partially overlap, which
>> is yet another indication that thinking of one as a font-style
>> variant of the other is not particularly helpful - even if the
>> ultimate origin may have derived from the same sign.
>>
>> At this stage of the game, they are properly disunified,
>> just as i and j or u and v.
>>
>> A./
>>
>>
>>
>>
>> On 1/15/2014 7:43 PM, Leif Halvard Silli wrote:
>>> Thanks to our discussion in July 2012,[1] the Unicode code charts now
>>> says, about 00F7 ? DIVISION SIGN, this:
>>>
>>>     ?? occasionally used as an alternate, more visually
>>>        distinct version of 2212 ? {MINUS SIGN} or 2011 ?
>>>        {NON-BREAKING HYPHEN} in some contexts
>>>           [? snip ?]
>>>      ? 2052 ? commercial minus sign?
>>>
>>> However, I think it can also be added somewhere that commercial minus
>>> is just the italic variant of ?division minus?. I?ll hereby argue for
>>> this based on an old German book on ?commercial arithmetics? I have
>>> come accross, plus what the the July 2012 discussion and what Unicode
>>> already says about the commercial sign:
>>>
>>> FIRST: IDENTICAL CONTEXTS.
>>>
>>>      German language is an important locale for the Commercial Minus. In
>>> German, the Commercial minus is both referred to as ?kaufm?nnische
>>> Minus(zeichen)? and as "buchhalterische Minus" (?Commercial Minus
>>> Character? and ?Bookkeeper Minus?). And, speaking of ?division minus?
>>> in the context I know best, Norway, we find it in advertising
>>> (commercial context) and in book keeping documentation and taxation
>>> forms. Simply put, what the Unicode 6.2 ?General Punctuation? section
>>> says about Commercial Minus, can also be said about DIVISION SIGN used
>>> as minus: ?U+2052 % commercial minus sign is used in commercial or tax
>>> related forms or publications in several European countries, including
>>> Germany and Scandinavia.? So, basically and for the most part, the
>>> commercial minus and the ?division sign minus? occur in the very same
>>> contexts, with very much the same meaning. This is a strong hint that
>>> they are the same character.
>>>
>>> SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT.
>>>
>>>      Is there any proof that German used both an italics variant and a
>>> non-italics variant of the ?division minus?? Seemingly yes. The book
>>> ?Kaufm?nnische Arithmetik? (?Commercial arithmetics?) from 1825 by
>>> Johann Philipp Schellenberg. By reading section 118 ?Anhang zur
>>> Addition und Subtraction der Br?che? [?Appendix about the addition and
>>> subtraction of fractions?]) at page 213 and onwards,[2] we can conclude
>>> that he describes as ?commercial? use of the ? ?division minus?, where
>>> the ? signifies a _negative remainder_ of a division (while the plus
>>> sign is used to signify a positive remainder). Or to quote, from page
>>> 214: ?so wird das Fehlende durch das [Zei]chen ? (minus) bemerkt, und
>>> bei Berechn[nung der Preis der Waare abgezogen? [?then the lacking
>>> remainder is marked with the ? (minus) and withdrawn when the price of
>>> the commodity is calculated?]. {Note that some bits of the text are
>>> lacking, I marked my guessed in square brackets.} I did not find (yet)
>>> that he used the italic commercial minus, however, the context is
>>> correct. (My guess is that the italics variant has been put to more
>>> use, in the computer age, partly to separate it from the DIVISION SIGN
>>> or may be simply because people started to see it often in handwriting
>>> but seldom in print. And so would not have recognized it in the form of
>>> the non-italic division sign.)
>>>
>>> THIRD: IDENTICAL INTERPRETATION
>>>
>>> The word ?abgezogen? in the above quote is interesting since the Code
>>> Charts for 2052 ? COMMERCIAL MINUS cites the related German word
>>> ?abz?glich?. And from the Swedish context, the charts quotes the
>>> expression ?med avdrag?. English translation might be ?to be withdrawn?
>>> or ?with subtraction/rebate [for]?. Simply put, we here see the
>>> commercial meaning.
>>>
>>> WHAT ABOUT COMMERCIAL MINUS AS ?CORRECT? SIGN  IN SCANDINAVIAN SCHOOLS?
>>>
>>> UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and
>>> perhaps Norwegian) traditions, teachers use the Commercial Minus Sign
>>> to signify that something is correct (whereas a red check mark is used
>>> to signify error). If my theory is right, that commercial minus and
>>> division sign minus are the same signs, how on earth is that possible?
>>> How can a minus sign count as positive for the student?
>>>
>>> The answer is, I think, to be found in the Code Chart?s Swedish
>>> description ("med avdrag"/"with subtraction/rebate"). Because, I think
>>> that the correct understanding is not that it means "correct" or "OK".
>>> Rather, it denotes something that is counted in the customer/student?s
>>> favor. So, you could say it it really means "slack", or "rebate".  So
>>> it really mans ?good answer?. It is a ?rebate? that the student
>>> rightfully deserves.
>>>
>>> FOURTH: A DEEPER MEANING
>>>
>>> If we look at it from a very high level, then we can say that the
>>> division minus is used to signify something that is the result of a
>>> calculation - such as a price, an entry in bookkeeping or, indeed, a
>>> character/mark/point/score in a (home)work evaluated by a teacher.
>>> Whereas the ?normal? minus sign is used to when we represent negative
>>> data. For example, in taxation, all the numbers one reports, is the
>>> result of some calculation. Likewise, when a teach ticks of an answer
>>> as ?good answer?, then it is because the teacher has evaluated (a.k.a.
>>> ?calculated?) the answer and found it to be good and that the student
>>> has calculated correctly/well.
>>>
>>> CIRCUMSTANCIAL EVIDENCE
>>>
>>> The commercial minus looks like a percentage sign. And also, in
>>> programming, e.g. JavaScript, the percentage sign is often used for the
>>> modulo operator - which is an operator that finds the dividend of a
>>> division.
>>>
>>> Hence, when we take all this together, I believe we have to conclude
>>> that the COMMERCIAL MINUS is just the italic variant of the DIVISION
>>> SIGN.
>>>
>>> PS: For more German documentation of this custom, it would probably be
>>> wise to research books about bookkeeping as well as ?commercial
>>> arithmetics?. I also have a suspicion that it would be worth
>>> investigation contexts where modulo/division remainders operations are
>>> found - for instance, in calendar calculations.
>>>
>>> [1] http://www.unicode.org/mail-arch/unicode-ml/y2012-m07/0053.html
>>> [2]
>>> https://archive.org/stream/kaufmnnischeari00schegoog#page/n229/mode/2up

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140116/254e3986/attachment.html>

From xn--mlform-iua at xn--mlform-iua.no  Thu Jan 16 10:12:17 2014
From: xn--mlform-iua at xn--mlform-iua.no (Leif Halvard Silli)
Date: Thu, 16 Jan 2014 17:12:17 +0100
Subject: Commercial minus as italic variant of division sign in German
 and Scandinavian context
In-Reply-To: <52D7F9BD.8060106@ix.netcom.com>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
 <52D7879A.70103@ix.netcom.com>
 <20140116143423686172.a3f32e12@xn--mlform-iua.no>
 <52D7F9BD.8060106@ix.netcom.com>
Message-ID: <20140116171217417407.024080bd@xn--mlform-iua.no>

Asmus Freytag, Thu, 16 Jan 2014 07:24:45 -0800:
> On 1/16/2014 5:34 AM, Leif Halvard Silli wrote:

>> when looking at my message in Firefox [1], the commercial minus
>> looks like a ?handwritten? variant of the division sign.

> the fact that the "slant" is reverse, rather than forward, 
> is contrary to the way oblique or italic fonts usually work.
> 
> So, again, I find your suggestion of "italic variant" not helpful.

Got it. ;-) Will stop using "italic" about it! Meanwhile, I think there 
*is* something to say about the slant, the slant does seem to be 
primarily linked to *style*. Just now, at colourbox.de, I found some 
vector icons which are simply labelled as minus icons, and which both 
of them are shaped like the DIVISION SIGN, and which occurs side by 
side with a plus sign. The labels for the icons are simply ?Icon - 
minus - schwarz wei?? and ?Icon - minus - hellblau?.  See: 
<http://www.colourbox.de/vektor/icon-seite-gefaltet-hellgrun-vektor-5753796>

You find it in Google if you search for ?kaufm?nnische Minuszeichen?. 
Take that as a hint.

>> Also, I wonder about the claim in the General Punctuation section that
>> commercial minus is used in taxation forms in Scandinavia and Germany.
   [?]
> I would not be surprised if the actual situation is a bit more 
> detailed than expressed in Unicode's namelist annotations (or
> even the descriptions in the chapter texts).
> 
> However, I can't assist you in tracking those down as I have access 
> to no taxation forms that use any of these characters. :)

:-)

>> Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in
>> the, kind of, ?mathematical? sense: [?]

> Where the case for 00F7 and 2052 differs from the mathematical alphabets is
> that in the latter case the shape variants are (to a very large 
> extent) accurately described by the typographical moniker. A bold is a bold.
> 
> The only exception that I can think of is in the realm of "script", 
> where some authors prefer a slightly different style that isn't tied
> to 18th century copperplate.

And by script you mean "handwriting style". That makes sense. That is 
how I perceive the German, commercial minus. 

> Suggest better text for the book chapter that details the precise 
> places that have been established as using 00F7 in the capacity
> of "minus sign". That would be more helpful than trying to somehow 
> treat 00F7 and 2052 as glyphic variants of each other. They are
> separate characters, with distinct usage conventions that simply
> happen to employ both a line and two dots. (The fallback of ./. for
> 2052 is  interesting in this context).

Ok. Will try. Though I think better text would tie them, rather than 
separate them. But I think you are artificially separating them.

> I was focused only at the minority use of 00F7 as a minus sign, in 
> which case
> it and 2052 AND 002D and 2012 all function as variants of each other (but
> not as glyphic variants --- they are spelling variants).

Good point. It is like the V and U - they have a common history.

>> It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no?
> Once you get into the dashes, there's tons of variant usage. What's 
> documented in Unicode tends to be from predominantly English-language 
> style manuals, but if you extend this to all publications in all 
> (Western) languages including recent historic times, I'm sure you'd 
> find surprising variations.

Does the Unicode spec say this - that is is predominantly English 
language based?

>> As it is today, no one seems to realize how commercial
>> minus relates to ?division sign minus?.
> "additional" names are ruled out - except to fix something that's 
> badly broken.
> Neither of these characters has names that are misleading, mistyped or both.
> 
> There are many characters with deep relations that many users do no know
> about. And, in this case, there seem to be some issues with the precise
> relation you are trying to implement.

I saw it as if in a mist. Now it becomes clearer and clearer to me. :-)

This fun page indicates that the ./. ?fallback? has a 35 year history. 
http://www.wertpapier-forum.de/topic/14587-kennzahlenanalyse/page__st__20 
Which could fit well together with a theory that the script variant 
grew in popularity when the ?international? ? division sign of 
computers entered German math. That ? as minus ?went back? due to 
computers and calculators, seems to be the general trend.
-- 
leif halvard silli


From asmusf at ix.netcom.com  Thu Jan 16 11:19:02 2014
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Thu, 16 Jan 2014 09:19:02 -0800
Subject: Commercial minus as italic variant of division sign in German
 and Scandinavian context
In-Reply-To: <20140116171217417407.024080bd@xn--mlform-iua.no>
References: <20140116044305293116.f28ead07@xn--mlform-iua.no>
 <52D7879A.70103@ix.netcom.com>
 <20140116143423686172.a3f32e12@xn--mlform-iua.no>
 <52D7F9BD.8060106@ix.netcom.com>
 <20140116171217417407.024080bd@xn--mlform-iua.no>
Message-ID: <52D81486.6010700@ix.netcom.com>

On 1/16/2014 8:12 AM, Leif Halvard Silli wrote:
> Asmus Freytag, Thu, 16 Jan 2014 07:24:45 -0800:
>> On 1/16/2014 5:34 AM, Leif Halvard Silli wrote:
>>> when looking at my message in Firefox [1], the commercial minus
>>> looks like a ?handwritten? variant of the division sign.
>> the fact that the "slant" is reverse, rather than forward,
>> is contrary to the way oblique or italic fonts usually work.
>>
>> So, again, I find your suggestion of "italic variant" not helpful.
> Got it. ;-) Will stop using "italic" about it!
OK. I'll hold you to it.
> Meanwhile, I think there
> *is* something to say about the slant, the slant does seem to be
> primarily linked to *style*.

Style in Unicode is used on two ways.

A) to indicate that a distinction is glyphic and can be ignored
B) to indicate that a glyph shape relates to a typographical style

A is wrong for 00F7 vs 2012 vs 2052. The distinctions are deliberate
and authors (and readers) would take exception if you substituted
another "style" of symbol. The fact is that these are not simply
accidental but correct disunifications.

In a sense, it's no different from "z" being used for the soft-s in
English (if not exclusively), "s" being used for both soft and hard
s in German and never being used for soft-s in Scandinavia.

When Unicode says it encodes the "semantics" of a character,
it doesn't mean that these semantics can't be context sensitive
or that different contexts can't call for different characters for
the same semantics. (In the minus case we are talking mathematical
semantics, while in the letter case we are talking phonetics, but
otherwise there's not a whole lot of distinction in the context
sensitive nature of character use).

The most useful concept (I have found) in these kinds of investigations
is "character identity". Here it is clear that something like 00F7
that can mean both division and minus (based on context) has
a different identity from 2012 or 2052 that (in math use) can only
mean minus. And 2052 is different from 2012 in that it is limited
to certain contexts, and 2012 cannot be used in marking papers.

So, just acknowledge that, and if you feel the need to add value,
do so by better descriptions of which context which character is
used in.

B is relevant for math alphabets, because the glyphs really are
constrained to match a typographical style. It's not relevant to
the case here, because 2052 is not a specific "style" of the
"same thing in another font".
> Just now, at colourbox.de, I found some
> vector icons which are simply labelled as minus icons, and which both
> of them are shaped like the DIVISION SIGN, and which occurs side by
> side with a plus sign. The labels for the icons are simply ?Icon -
> minus - schwarz wei?? and ?Icon - minus - hellblau?.  See:
> <http://www.colourbox.de/vektor/icon-seite-gefaltet-hellgrun-vektor-5753796>
>
> You find it in Google if you search for ?kaufm?nnische Minuszeichen?.
> Take that as a hint.

This could be for two reasons.

A) there is some use where 00F7 has the semantics of minus.
B) the icon is misnamed in the source because of the visual similarity 
with a minus

Unfortunately, by itself, you can't use that source to distinguish A from B.


>
>>> Anyway, when I spoke if 2052 as an italic version of 00F7, I meant in
>>> the, kind of, ?mathematical? sense: [?]
>> Where the case for 00F7 and 2052 differs from the mathematical alphabets is
>> that in the latter case the shape variants are (to a very large
>> extent) accurately described by the typographical moniker. A bold is a bold.
>>
>> The only exception that I can think of is in the realm of "script",
>> where some authors prefer a slightly different style that isn't tied
>> to 18th century copperplate.
> And by script you mean "handwriting style". That makes sense. That is
> how I perceive the German, commercial minus.
It may be derived from a handwritten mark - most accounting wasn't 
typeset - but
the exception that I was referring to are Knuth's "Euler" fonts which he 
uses instead
of "script" in his mathematical works. Their ductus retains just faint 
traces of handwriting,
and none of the elaborate styles of handwriting that typical "script" 
fonts are based
on, but they serve their purpose in mathematics (unless you are a 
purist) because they
are distinct from all the other styles and arguably a bit more readable.

Your applying my comment to 2052 is taking it wildly out of context.
>
>> Suggest better text for the book chapter that details the precise
>> places that have been established as using 00F7 in the capacity
>> of "minus sign". That would be more helpful than trying to somehow
>> treat 00F7 and 2052 as glyphic variants of each other. They are
>> separate characters, with distinct usage conventions that simply
>> happen to employ both a line and two dots. (The fallback of ./. for
>> 2052 is  interesting in this context).
> Ok. Will try. Though I think better text would tie them, rather than
> separate them. But I think you are artificially separating them.
I am arguing that they have a distinct "identity".

That doesn't mean that their usage can't overlap. (That's what you tend 
to think
of as "ties".) I think it less helpful to consider the characters "tied" 
than to describe
the usage.

>
>> I was focused only at the minority use of 00F7 as a minus sign, in
>> which case
>> it and 2052 AND 002D and 2012 all function as variants of each other (but
>> not as glyphic variants --- they are spelling variants).
> Good point. It is like the V and U - they have a common history.
The U and V historically derive from the same letter.

00F7 and 2052 use the same elements in a different configuration. That's 
ALL that
we know about them, unless you have additional research. Asserting a 
derivation
is complete speculation at this point.

What we can attest is that ./. is a typewriter-supported (if not caused) 
variant of
2052 (the exact elevation of the initial dot may have varied in hand 
writing as
a "free variation", but the typewriter could only do the period).

We cannot attest that ./. is a variant of 00F7 or that 2052 was ever a 
free variant
of 00F7. In today's usage, the selection depends on context (user group, 
target
audience) and is not a free variant.
>
>>> It is an argument for seeing 00F7 as (also) a hyphen-minus variant, no?
>> Once you get into the dashes, there's tons of variant usage. What's
>> documented in Unicode tends to be from predominantly English-language
>> style manuals, but if you extend this to all publications in all
>> (Western) languages including recent historic times, I'm sure you'd
>> find surprising variations.
> Does the Unicode spec say this - that is is predominantly English
> language based?
It goes without saying that authors working in English have easier access to
manuals in that language. It's not intentional, but if you've been 
around you
would find that in many cases, usage information from other languages has
tended to be incorporated as changes to the original text, not from the 
start.

So, go ahead and add more.
>
>>> As it is today, no one seems to realize how commercial
>>> minus relates to ?division sign minus?.
>> "additional" names are ruled out - except to fix something that's
>> badly broken.
>> Neither of these characters has names that are misleading, mistyped or both.
>>
>> There are many characters with deep relations that many users do no know
>> about. And, in this case, there seem to be some issues with the precise
>> relation you are trying to implement.
> I saw it as if in a mist. Now it becomes clearer and clearer to me. :-)
>
> This fun page indicates that the ./. ?fallback? has a 35 year history.
> http://www.wertpapier-forum.de/topic/14587-kennzahlenanalyse/page__st__20

No, it says that the history goes back *at least* 35 years. This figure 
is probably based
on somebody's earliest *personal* recollection, not historical search, 
and 35 years tends
to span a professional lifetime.
> Which could fit well together with a theory that the script variant
> grew in popularity when the ?international? ? division sign of
> computers entered German math. That ? as minus ?went back? due to
> computers and calculators, seems to be the general trend.
That, my friend, is utter and pure nonsense. I would call it an urban 
legend in the
making. Instead of "mists" you are creating "myths" here, from whole 
cloth, no less.

Cheers,

A./


From samjnaa at gmail.com  Tue Jan 21 06:48:26 2014
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Tue, 21 Jan 2014 18:18:26 +0530
Subject: Offlist UniView mini-app
Message-ID: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>

Since I have connectivity problems now and then, I wrote a mini-app
using PyQt to give me the basic features of Ishida's UniView (which
also seems to have had some server problems recently)... Maybe it
would be useful to others also so I'm posting here. It's under the GPL
since I use PyQt under the GPL.

Since it depends on PyQt, it is probably immediately usable by Linux
users, esp. who use distros which have PyQt pre-installed or
installable by a single command like apt-get or yum. On other
platforms, you'll have to have installed Python and PyQt as
appropriate...

BTW I use Py3, so maybe a few tweaks would be needed to get it working
with Py2. Since it's GPL, please feel free to make derivatives.

I hope the name "UniView" is not copyrighted or anything. Certainly
don't intend to infringe...

-- 
Shriramana Sharma ???????????? ????????????
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uniview.py
Type: text/x-python
Size: 6386 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20140121/da81e952/attachment.py>

From stephan.stiller at gmail.com  Wed Jan 22 02:38:25 2014
From: stephan.stiller at gmail.com (Stephan Stiller)
Date: Wed, 22 Jan 2014 00:38:25 -0800
Subject: Egyptian Demotic
Message-ID: <52DF8381.5080804@gmail.com>

Hi all,

Is Egyptian Demotic on somebody's roadmap for Unicode?

(Egyptian Demotic is what's on the middle third of the Rosetta Stone.)

Stephan


From frederic.grosshans at gmail.com  Wed Jan 22 07:48:05 2014
From: frederic.grosshans at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Grosshans?=)
Date: Wed, 22 Jan 2014 14:48:05 +0100
Subject: Egyptian Demotic
In-Reply-To: <52DF8381.5080804@gmail.com>
References: <52DF8381.5080804@gmail.com>
Message-ID: <52DFCC15.20200@gmail.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140122/eea372a5/attachment.html>

From samjnaa at gmail.com  Wed Jan 22 23:52:12 2014
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Thu, 23 Jan 2014 11:22:12 +0530
Subject: Offlist UniView mini-app
In-Reply-To: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>
References: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>
Message-ID: <CAH-HCWXWY5HGkTw8QUoNeiPf8Rb5bs6f0hTepwxHSBF43ZBa_Q@mail.gmail.com>

Not sure if anyone actually tried this app, but just wanted to notify that
I found a small bug. To correct it, insert "4 < " after "elif " on line 15.

Shriramana.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140123/991e8c43/attachment.html>

From leob at mailcom.com  Thu Jan 23 00:39:58 2014
From: leob at mailcom.com (Leo Broukhis)
Date: Wed, 22 Jan 2014 22:39:58 -0800
Subject: Another Unicode viewing site
Message-ID: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>

I find http://unicode-table.com/ of which I cannot find a previous mention
on the list, quite convenient (keep scrolling). Not all of Unicode 6.0 and
6.1 is there yet, though, as it is a hobby project of a multi-national
team.
Interface languages include English, German, Russian, Ukrainian, Chinese,
and Thai.

Leo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140122/befb490f/attachment.html>

From boldewyn at gmail.com  Thu Jan 23 08:19:50 2014
From: boldewyn at gmail.com (Manuel Strehl)
Date: Thu, 23 Jan 2014 15:19:50 +0100
Subject: Another Unicode viewing site
In-Reply-To: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>
References: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>
Message-ID: <CAEZUo2dS107_=VDKTgww_BA8OZU1=fS9zpDiURnoozqo2P0F+A@mail.gmail.com>

Yes, they have the huge advantage over my http://codepoints.net, that they
have a team providing already so many translations. I envy them for that a
bit. But competition is good for business. :-)

Cheers,
Manuel


2014/1/23 Leo Broukhis <leob at mailcom.com>

> I find http://unicode-table.com/ of which I cannot find a previous
> mention on the list, quite convenient (keep scrolling). Not all of Unicode
> 6.0 and 6.1 is there yet, though, as it is a hobby project of a
> multi-national team.
> Interface languages include English, German, Russian, Ukrainian, Chinese,
> and Thai.
>
> Leo
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140123/fa5bb673/attachment.html>

From samjnaa at gmail.com  Thu Jan 23 10:50:49 2014
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Thu, 23 Jan 2014 22:20:49 +0530
Subject: Offlist UniView mini-app
In-Reply-To: <52E1472F.5040501@behdad.org>
References: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>
 <CAH-HCWXWY5HGkTw8QUoNeiPf8Rb5bs6f0hTepwxHSBF43ZBa_Q@mail.gmail.com>
 <52E1472F.5040501@behdad.org>
Message-ID: <CAH-HCWU02wByZUFSjM+_xeUNcukveVmZAO5oDGBM1GdHiygVvQ@mail.gmail.com>

On Thu, Jan 23, 2014 at 10:15 PM, Behdad Esfahbod <behdad at behdad.org> wrote:

> lol.  How about you post on github at least?
>

OK good idea. Any objections to re-using the name UniView? I suppose Ishida
is on either of these lists. I would like to hear from him especially.

-- 
Shriramana Sharma ???????????? ????????????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140123/bcf77486/attachment.html>

From ishida at w3.org  Thu Jan 23 12:00:01 2014
From: ishida at w3.org (Richard Ishida)
Date: Thu, 23 Jan 2014 18:00:01 +0000
Subject: Offlist UniView mini-app
In-Reply-To: <CAH-HCWU02wByZUFSjM+_xeUNcukveVmZAO5oDGBM1GdHiygVvQ@mail.gmail.com>
References: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>
 <CAH-HCWXWY5HGkTw8QUoNeiPf8Rb5bs6f0hTepwxHSBF43ZBa_Q@mail.gmail.com>
 <52E1472F.5040501@behdad.org>
 <CAH-HCWU02wByZUFSjM+_xeUNcukveVmZAO5oDGBM1GdHiygVvQ@mail.gmail.com>
Message-ID: <52E158A1.3080005@w3.org>

Well, I would prefer you don't use the name UniView, since that would 
create confusion.

The reason my UniView (and UniView lite) tool is currently unavailable 
is that my site was hacked a week or so ago and I'm rebuilding it 
online. I'm still working on a solution for hosting the pages that need 
to run in PHP, but I expect UniView to back in operation soon.

Thank you.
RI


On 23/01/2014 16:50, Shriramana Sharma wrote:
> On Thu, Jan 23, 2014 at 10:15 PM, Behdad Esfahbod <behdad at behdad.org
> <mailto:behdad at behdad.org>> wrote:
>
>     lol.  How about you post on github at least?
>
>
> OK good idea. Any objections to re-using the name UniView? I suppose
> Ishida is on either of these lists. I would like to hear from him
> especially.
>
> --
> Shriramana Sharma ???????????? ????????????
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>


From samjnaa at gmail.com  Thu Jan 23 12:01:25 2014
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Thu, 23 Jan 2014 23:31:25 +0530
Subject: Offlist UniView mini-app
In-Reply-To: <52E158A1.3080005@w3.org>
References: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>
 <CAH-HCWXWY5HGkTw8QUoNeiPf8Rb5bs6f0hTepwxHSBF43ZBa_Q@mail.gmail.com>
 <52E1472F.5040501@behdad.org>
 <CAH-HCWU02wByZUFSjM+_xeUNcukveVmZAO5oDGBM1GdHiygVvQ@mail.gmail.com>
 <52E158A1.3080005@w3.org>
Message-ID: <CAH-HCWUbJWo2snVfAAkgZ99NTp5tBOr8hyCTfhWC_+rAJNNu3w@mail.gmail.com>

On Thu, Jan 23, 2014 at 11:30 PM, Richard Ishida <ishida at w3.org> wrote:

> Well, I would prefer you don't use the name UniView, since that would
> create confusion.
>

OK thanks for that. I am not particular about the name -- though it is
quite apt. Will think of something else... Just something bland like
"Codepoint Viewer" would do I suppose...

-- 
Shriramana Sharma ???????????? ????????????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140123/d31558a2/attachment.html>

From samjnaa at gmail.com  Thu Jan 23 12:26:53 2014
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Thu, 23 Jan 2014 23:56:53 +0530
Subject: Offlist UniView mini-app
In-Reply-To: <52E1472F.5040501@behdad.org>
References: <CAH-HCWV9oW1HEkxVemdhoOVwo=nGm=-iAR5f-OBmNru7Hw6=LA@mail.gmail.com>
 <CAH-HCWXWY5HGkTw8QUoNeiPf8Rb5bs6f0hTepwxHSBF43ZBa_Q@mail.gmail.com>
 <52E1472F.5040501@behdad.org>
Message-ID: <CAH-HCWUYEe+f0y90uqbKBfF8MJ+DWSo_fgAJTC01r7AM8YEEbw@mail.gmail.com>

On Thu, Jan 23, 2014 at 10:15 PM, Behdad Esfahbod <behdad at behdad.org> wrote:

> lol.  How about you post on github at least?
>

Thanks for the encouragement. I didn't think it would be *that* important
to do that. Please visit now:

https://github.com/jamadagni/cpview

-- 
Shriramana Sharma ???????????? ????????????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140123/df3f2a7f/attachment.html>

From johannes at bergerhausen.com  Fri Jan 24 07:55:07 2014
From: johannes at bergerhausen.com (Johannes Bergerhausen)
Date: Fri, 24 Jan 2014 14:55:07 +0100
Subject: Another Unicode viewing site
In-Reply-To: <CAEZUo2dS107_=VDKTgww_BA8OZU1=fS9zpDiURnoozqo2P0F+A@mail.gmail.com>
References: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>
 <CAEZUo2dS107_=VDKTgww_BA8OZU1=fS9zpDiURnoozqo2P0F+A@mail.gmail.com>
Message-ID: <9B1C6C9E-CE67-436B-A990-98F1B98F011C@bergerhausen.com>

We are working on an update of decodeunicode.org

Johannes


From verdy_p at wanadoo.fr  Fri Jan 24 11:14:54 2014
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 24 Jan 2014 18:14:54 +0100
Subject: Another Unicode viewing site
In-Reply-To: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>
References: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>
Message-ID: <CAGa7JC36zQti+6XRHdw4bCkkqJuhn8HfjZa9UuT+yZqnZUSvkA@mail.gmail.com>

The bad thung is that the whole BMP is loaded in a giant HTML table encoded
in a very unefficient way, but worse, everything is using Webfonts of 4K
glyphs, the page takes a lot of memory with those temporary fonts.

The webfonts do not seem to load dynamically on demand. Strane becaise the
page is also full of Javascript, and Javascript would have just loaded the
necessary webfonts on demand, and would have generated the page on the
flow, with just enough rows to fit the screen and still the possibility to
scroll the table, without leaving all those Webfonts active in the document.
Javascript could also have detected suitable fonts already existing on the
PC with the browser, and the page would have been much lighter.


2014/1/23 Leo Broukhis <leob at mailcom.com>

> I find http://unicode-table.com/ of which I cannot find a previous
> mention on the list, quite convenient (keep scrolling). Not all of Unicode
> 6.0 and 6.1 is there yet, though, as it is a hobby project of a
> multi-national team.
> Interface languages include English, German, Russian, Ukrainian, Chinese,
> and Thai.
>
> Leo
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140124/e6f13f98/attachment.html>

From leob at mailcom.com  Fri Jan 24 22:31:37 2014
From: leob at mailcom.com (Leo Broukhis)
Date: Fri, 24 Jan 2014 20:31:37 -0800
Subject: Another Unicode viewing site
In-Reply-To: <CAGa7JC36zQti+6XRHdw4bCkkqJuhn8HfjZa9UuT+yZqnZUSvkA@mail.gmail.com>
References: <CAFmvRsfuR8CmCfJJ9ucbxszx+S+3RtfaQ8csEM36p_5OjF-FAQ@mail.gmail.com>
 <CAGa7JC36zQti+6XRHdw4bCkkqJuhn8HfjZa9UuT+yZqnZUSvkA@mail.gmail.com>
Message-ID: <CAFmvRsfY9ju05F0qUrRn9h6z8=L4LGHkFr2dfw4iTC0-gquV=g@mail.gmail.com>

Hi Philippe,

I have no relation to the project; you may want to leave your feedback
directly on the site.

Leo


On Fri, Jan 24, 2014 at 9:14 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> The bad thung is that the whole BMP is loaded in a giant HTML table
> encoded in a very unefficient way, but worse, everything is using Webfonts
> of 4K glyphs, the page takes a lot of memory with those temporary fonts.
>
> The webfonts do not seem to load dynamically on demand. Strane becaise the
> page is also full of Javascript, and Javascript would have just loaded the
> necessary webfonts on demand, and would have generated the page on the
> flow, with just enough rows to fit the screen and still the possibility to
> scroll the table, without leaving all those Webfonts active in the document.
> Javascript could also have detected suitable fonts already existing on the
> PC with the browser, and the page would have been much lighter.
>
>
>
> 2014/1/23 Leo Broukhis <leob at mailcom.com>
>
>> I find http://unicode-table.com/ of which I cannot find a previous
>> mention on the list, quite convenient (keep scrolling). Not all of Unicode
>> 6.0 and 6.1 is there yet, though, as it is a hobby project of a
>> multi-national team.
>> Interface languages include English, German, Russian, Ukrainian, Chinese,
>> and Thai.
>>
>> Leo
>>
>> _______________________________________________
>> Unicode mailing list
>> Unicode at unicode.org
>> http://unicode.org/mailman/listinfo/unicode
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140124/bfdff332/attachment.html>

From kojiishi at gluesoft.co.jp  Mon Jan 27 19:18:15 2014
From: kojiishi at gluesoft.co.jp (Koji Ishii)
Date: Tue, 28 Jan 2014 01:18:15 +0000
Subject: [css-writing-modes] nit
In-Reply-To: <5296D692.2020201@css-class.com>
References: <52957E08.1060000@inkedblade.net> <52960D39.5050309@ix.netcom.com>
 <CAAWBYDCJHACxu6GoBtFCHOxgKA4ZoTg9J4RheYAmeMZ0LKdGWQ@mail.gmail.com>
 <5296D692.2020201@css-class.com>
Message-ID: <d1b3871200da4e11989f8f7209bc1176@HKXPR01MB005.apcprd01.prod.exchangelabs.com>

> Possibly all notes and issues and use overflow:auto since any widths
> also applied to the notes and issues may end up having things hidden.

Thank you Alan for the suggestion, fixed in the editor's draft[1].

[1] http://dev.w3.org/csswg/css-writing-modes/


From kojiishi at gluesoft.co.jp  Mon Jan 27 19:34:38 2014
From: kojiishi at gluesoft.co.jp (Koji Ishii)
Date: Tue, 28 Jan 2014 01:34:38 +0000
Subject: [CSSWG][css-writing-modes] Last Call for Comments on CSS3
 Writing  Modes
In-Reply-To: <BLU174-W409D4099EDCD4ED53E28ACB3C60@phx.gbl>
References: <BLU174-W409D4099EDCD4ED53E28ACB3C60@phx.gbl>
Message-ID: <5C1870EC-0ED7-400A-A469-FB6635D4FEB1@gluesoft.co.jp>

On Dec 21, 2013, at 20:39, CE Whitehead <cewcathar at hotmail.com<mailto:cewcathar at hotmail.com>> wrote:

4.3
"alphabetic
    The alphabetic baseline is assumed to be at the under margin edge.
"central
    The central baseline is assumed to be halfway between the under and over margin edges of the box. "
=>
"alphabetic
    The alphabetic baseline is assumed to be at the under-margin edge.
"central
    The central baseline is assumed to be halfway between the under- and over-margin edges of the box. "

{COMMENT:  normally when you use two words to modify a single word, as when "under margin", "over margin" modify the word, "edge" or "edges", then it is customary to join the two modifying words with a hyphen.}

Fixed.

6.2
inline-start

"Nominally the side from which text of its inline base direction will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. "
=>
"The side of a box from which text will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. "
?
{COMMENT: This text is unclear to me; not sure what you mean by "its" -- the box's?; I am not sure thus how to reword "inline base direction" -- so I left this phrase out though you probably need something. Also do you need to say "Nominally"? Because "nominally" does not mean anything to me in this sentence, though normally "nominally" is defined as "in name" -- but I cannot see saying this here; it just seems to not be the right word. Also finally, and I know this is a dumb question, but why can the inline--start never be at the top or the bottom, when the lines run top-to-bottom or bottom-to-top? The diagram seems to suggest that inline-start can be at the bottom or top.}

Please allow me to work on this later.

6.2 second paragraph (after the list of four "flow-relative  directions" -- block-end, block-start, etc.)
"Where unambiguous (or dual-meaning), the terms start and end are used in place of block-start/inline-start and block-end/inline-end, respectively."

{COMMENT: "unambiguous" is the opposite of "dual-meaning" -- "dual meaning" means "ambiguous"; do you mean the following? (if so it's o.k. to eliminate the stuff in parentheses altogether):}

Fixed.

6.3 Line-relative directions

Figure 15, Figure 16
{COMMENT: is it possible to have more space between these two figures?}

Fixed.

/koji

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140128/e295d20a/attachment.html>

From jknappen at web.de  Wed Jan 29 08:59:43 2014
From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=)
Date: Wed, 29 Jan 2014 15:59:43 +0100 (CET)
Subject: Aw: Re:  Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>,
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
Message-ID: <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/7ddeb971/attachment.html>

From buck at yelp.com  Wed Jan 29 12:21:55 2014
From: buck at yelp.com (Buck Golemon)
Date: Wed, 29 Jan 2014 10:21:55 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
Message-ID: <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>

J?rg:

This is the definition of cp1252 used by the whatwg and all current browser
implementations.
I've appealed to the cp1252 maintainer to update the definition so that we
don't have two competing standards, but I was rejected.
I've been considering naming it cp1252-whatwg.


On Wed, Jan 29, 2014 at 6:59 AM, "J?rg Knappen" <jknappen at web.de> wrote:

> A little postscrptum to this old thread:
>
> On pyPi, there is now a codec available that handles the peculiar
> definition of "latin1" inside mysql.
> The package is called mysql-latin1-codec and features an encoding
> consisting of cp1252 plus
> 0x81, 0x8D, 0x8F, 0x90, 0x9D (the latter five characters are undefined in
> the  python codec for cp1252).
>
>  https://pypi.python.org/pypi/mysql-latin1-codec/1.0
>
> --J?rg Knappen
>
>  *Gesendet:* Mittwoch, 30. Oktober 2013 um 19:14 Uhr
> *Von:* "Buck Golemon" <buck at yelp.com>
> *An:* "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>
> *Cc:* "J?rg Knappen" <jknappen at web.de>, unicode <unicode at unicode.org>
> *Betreff:* Re: Aw: Re: Re: Re: Re: Do you know a tool to decode "UTF-8
> twice"
>
>
> On Wed, Oct 30, 2013 at 9:56 AM, Fr?d?ric Grosshans <
> frederic.grosshans at gmail.com> wrote:
>>
>> Le 30/10/2013 17:32, "J?rg Knappen" a ?crit :
>>
>>>
>>> The data did not only contain latin-1 type mangling for the non-existent
>>> Windows characters, but also sequences with the raw
>>> C1 control characters for all of latin-1. So I had to do them, too.
>>> The data weren't consistent at all, not even in their errors.
>>> --J?rg Knappen
>>
>>  Your question helped me dust off and repair a non working python snippet
>> I wrote for a similar problem. I was stuck with the mixing of windows-1252
>> and latin1 controls (linked with a chinese characters). I write it below
>> for reference.
>>
>> The python snippet below does not need sed, defines a function
>> (unscramble(S)) which works on strings. The extension to files should be
>> easy.
>>
>>     Fr?d?ric Grosshans
>>
>>
>> def Step1Filter(S):
>>     for c in S :
>>     #works character/character because of the cp1252/latin1 ambiguity
>>         try :
>>             yield c.encode('cp1252')
>>         except UnicodeEncodeError :
>>             yield c.encode('latin1')
>>             #Useful where cp1252 is undefined (81, 8D, 8F, 90, 9D)
>>
>> def unscramble(S):
>>     return b''.join(c for c in Step1Filter(S)).decode('utf8')
>>
>> PS: If anyone is interested in a licence, I consider this simple enough
>> to be in the public domain an uncopyrightable.
>>
>
>  This encoding you've implemented above is known as windows-1252 by the
> whatwg and all browsers [1][2].
> The implementation of cp1252 in python is instead a direct consequence of
> the unicode.org definition [3].
>
>  [1] http://encoding.spec.whatwg.org/index-windows-1252.txt
>  [2] http://bukzor.github.io/encodings/cp1252.html
>  [3]
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/78879d0b/attachment.html>

From kojiishi at gluesoft.co.jp  Wed Jan 29 12:24:11 2014
From: kojiishi at gluesoft.co.jp (Koji Ishii)
Date: Wed, 29 Jan 2014 18:24:11 +0000
Subject: [CSSWG][css-writing-modes] Last Call for Comments on CSS3
 Writing  Modes
In-Reply-To: <5C1870EC-0ED7-400A-A469-FB6635D4FEB1@gluesoft.co.jp>
References: <BLU174-W409D4099EDCD4ED53E28ACB3C60@phx.gbl>
 <5C1870EC-0ED7-400A-A469-FB6635D4FEB1@gluesoft.co.jp>
Message-ID: <F331F92B-550D-45BB-8C87-2C3EAF5E3741@gluesoft.co.jp>


On Jan 27, 2014, at 17:34, Koji Ishii <kojiishi at gluesoft.co.jp<mailto:kojiishi at gluesoft.co.jp>> wrote:

On Dec 21, 2013, at 20:39, CE Whitehead <cewcathar at hotmail.com<mailto:cewcathar at hotmail.com>> wrote:
6.2
inline-start

"Nominally the side from which text of its inline base direction will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. "
=>
"The side of a box from which text will start. For boxes with a used direction value of ltr, this means the line-left side. For boxes with a used direction value of rtl, this means the line-right side. "
?
{COMMENT: This text is unclear to me; not sure what you mean by "its" -- the box's?; I am not sure thus how to reword "inline base direction" -- so I left this phrase out though you probably need something. Also do you need to say "Nominally"? Because "nominally" does not mean anything to me in this sentence, though normally "nominally" is defined as "in name" -- but I cannot see saying this here; it just seems to not be the right word. Also finally, and I know this is a dumb question, but why can the inline--start never be at the top or the bottom, when the lines run top-to-bottom or bottom-to-top? The diagram seems to suggest that inline-start can be at the bottom or top.}

Please allow me to work on this later.

Fixed.

/koji

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/8a2b9b9e/attachment.html>

From buck at yelp.com  Wed Jan 29 12:32:05 2014
From: buck at yelp.com (Buck Golemon)
Date: Wed, 29 Jan 2014 10:32:05 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
Message-ID: <CANDQx1pxeNDyG4+hJ7iX0QpPTNvQVTC2je--kYoaLJ3uQeNNnw@mail.gmail.com>

J?rg:

I case you want to see the previous discussions on the subject, here they
are:

 * "data for cp1252"
http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0233".html<http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0233.html>
 * "cp1252 decoder implementation"
http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0167.html
 * tangential "latin1 decoder implementation"
http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0146.html


On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at yelp.com> wrote:

> J?rg:
>
> This is the definition of cp1252 used by the whatwg and all current
> browser implementations.
> I've appealed to the cp1252 maintainer to update the definition so that we
> don't have two competing standards, but I was rejected.
> I've been considering naming it cp1252-whatwg.
>
>
> On Wed, Jan 29, 2014 at 6:59 AM, "J?rg Knappen" <jknappen at web.de> wrote:
>
>> A little postscrptum to this old thread:
>>
>> On pyPi, there is now a codec available that handles the peculiar
>> definition of "latin1" inside mysql.
>> The package is called mysql-latin1-codec and features an encoding
>> consisting of cp1252 plus
>> 0x81, 0x8D, 0x8F, 0x90, 0x9D (the latter five characters are undefined in
>> the  python codec for cp1252).
>>
>>  https://pypi.python.org/pypi/mysql-latin1-codec/1.0
>>
>> --J?rg Knappen
>>
>>  *Gesendet:* Mittwoch, 30. Oktober 2013 um 19:14 Uhr
>> *Von:* "Buck Golemon" <buck at yelp.com>
>> *An:* "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>
>> *Cc:* "J?rg Knappen" <jknappen at web.de>, unicode <unicode at unicode.org>
>> *Betreff:* Re: Aw: Re: Re: Re: Re: Do you know a tool to decode "UTF-8
>> twice"
>>
>>
>> On Wed, Oct 30, 2013 at 9:56 AM, Fr?d?ric Grosshans <
>> frederic.grosshans at gmail.com> wrote:
>>>
>>> Le 30/10/2013 17:32, "J?rg Knappen" a ?crit :
>>>
>>>>
>>>> The data did not only contain latin-1 type mangling for the
>>>> non-existent Windows characters, but also sequences with the raw
>>>> C1 control characters for all of latin-1. So I had to do them, too.
>>>> The data weren't consistent at all, not even in their errors.
>>>> --J?rg Knappen
>>>
>>>  Your question helped me dust off and repair a non working python
>>> snippet I wrote for a similar problem. I was stuck with the mixing of
>>> windows-1252 and latin1 controls (linked with a chinese characters). I
>>> write it below for reference.
>>>
>>> The python snippet below does not need sed, defines a function
>>> (unscramble(S)) which works on strings. The extension to files should be
>>> easy.
>>>
>>>     Fr?d?ric Grosshans
>>>
>>>
>>> def Step1Filter(S):
>>>     for c in S :
>>>     #works character/character because of the cp1252/latin1 ambiguity
>>>         try :
>>>             yield c.encode('cp1252')
>>>         except UnicodeEncodeError :
>>>             yield c.encode('latin1')
>>>             #Useful where cp1252 is undefined (81, 8D, 8F, 90, 9D)
>>>
>>> def unscramble(S):
>>>     return b''.join(c for c in Step1Filter(S)).decode('utf8')
>>>
>>> PS: If anyone is interested in a licence, I consider this simple enough
>>> to be in the public domain an uncopyrightable.
>>>
>>
>>  This encoding you've implemented above is known as windows-1252 by the
>> whatwg and all browsers [1][2].
>> The implementation of cp1252 in python is instead a direct consequence of
>> the unicode.org definition [3].
>>
>>  [1] http://encoding.spec.whatwg.org/index-windows-1252.txt
>>  [2] http://bukzor.github.io/encodings/cp1252.html
>>  [3]
>> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/0d3d3611/attachment.html>

From markus.icu at gmail.com  Wed Jan 29 13:22:35 2014
From: markus.icu at gmail.com (Markus Scherer)
Date: Wed, 29 Jan 2014 11:22:35 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
Message-ID: <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@mail.gmail.com>

On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at yelp.com> wrote:

> I've been considering naming it cp1252-whatwg.
>

It would be nicer to put the organization name first, such as whatwg-cp1252
or maybe better html-cp1252. That would be more like ibm-932 and such.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/d4742918/attachment.html>

From buck at yelp.com  Wed Jan 29 13:57:08 2014
From: buck at yelp.com (Buck Golemon)
Date: Wed, 29 Jan 2014 11:57:08 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
Message-ID: <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>

Anne: Given that the intent is to implement exactly the whatwg spec, and
the group is currently called "whatwg" (even though it may eventually
become a historical artifact), is "whatwg-1252" most appropriate?

Norbert Lindenberg previously suggested standardizing some kind of
disambiguation.
http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html

Do you most prefer the s/web-/cp/ pattern?


On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren <annevk at annevk.nl>wrote:

> On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer <markus.icu at gmail.com>
> wrote:
> > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at yelp.com> wrote:
> >> I've been considering naming it cp1252-whatwg.
> >
> > It would be nicer to put the organization name first, such as
> whatwg-cp1252
> > or maybe better html-cp1252. That would be more like ibm-932 and such.
>
> If you want to support more encodings than
> http://encoding.spec.whatwg.org/ defines I suggest using the prefix
> "web-". The organization may change and this is not tied to HTML.
>
>
> --
> http://annevankesteren.nl/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/c20a0bdf/attachment.html>

From craig.gallacher at gmail.com  Wed Jan 29 14:04:53 2014
From: craig.gallacher at gmail.com (Craig Gallacher)
Date: Wed, 29 Jan 2014 20:04:53 +0000
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
Message-ID: <CAMJn+1BYgb5o2xq7a6iaWrjD-dQ8pB6AowOjqS7Mxtjt-K5TpA@mail.gmail.com>

Apologies I know this is on the website, but how do I unsubscribe from this
list?

Cheers
C

?
grampianmountains.net
?44 (0)7877 990538


On 29 January 2014 19:57, Buck Golemon <buck at yelp.com> wrote:

> Anne: Given that the intent is to implement exactly the whatwg spec, and
> the group is currently called "whatwg" (even though it may eventually
> become a historical artifact), is "whatwg-1252" most appropriate?
>
> Norbert Lindenberg previously suggested standardizing some kind of
> disambiguation.
> http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html
>
> Do you most prefer the s/web-/cp/ pattern?
>
>
> On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren <annevk at annevk.nl>wrote:
>
>> On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer <markus.icu at gmail.com>
>> wrote:
>> > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at yelp.com> wrote:
>> >> I've been considering naming it cp1252-whatwg.
>> >
>> > It would be nicer to put the organization name first, such as
>> whatwg-cp1252
>> > or maybe better html-cp1252. That would be more like ibm-932 and such.
>>
>> If you want to support more encodings than
>> http://encoding.spec.whatwg.org/ defines I suggest using the prefix
>> "web-". The organization may change and this is not tied to HTML.
>>
>>
>> --
>> http://annevankesteren.nl/
>>
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/89a333dc/attachment.html>

From craig.gallacher at gmail.com  Wed Jan 29 14:04:53 2014
From: craig.gallacher at gmail.com (Craig Gallacher)
Date: Wed, 29 Jan 2014 20:04:53 +0000
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
Message-ID: <CAMJn+1BYgb5o2xq7a6iaWrjD-dQ8pB6AowOjqS7Mxtjt-K5TpA@mail.gmail.com>

Apologies I know this is on the website, but how do I unsubscribe from this
list?

Cheers
C

?
grampianmountains.net
?44 (0)7877 990538


On 29 January 2014 19:57, Buck Golemon <buck at yelp.com> wrote:

> Anne: Given that the intent is to implement exactly the whatwg spec, and
> the group is currently called "whatwg" (even though it may eventually
> become a historical artifact), is "whatwg-1252" most appropriate?
>
> Norbert Lindenberg previously suggested standardizing some kind of
> disambiguation.
> http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html
>
> Do you most prefer the s/web-/cp/ pattern?
>
>
> On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren <annevk at annevk.nl>wrote:
>
>> On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer <markus.icu at gmail.com>
>> wrote:
>> > On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at yelp.com> wrote:
>> >> I've been considering naming it cp1252-whatwg.
>> >
>> > It would be nicer to put the organization name first, such as
>> whatwg-cp1252
>> > or maybe better html-cp1252. That would be more like ibm-932 and such.
>>
>> If you want to support more encodings than
>> http://encoding.spec.whatwg.org/ defines I suggest using the prefix
>> "web-". The organization may change and this is not tied to HTML.
>>
>>
>> --
>> http://annevankesteren.nl/
>>
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/89a333dc/attachment-0001.html>

From textexin at xencraft.com  Wed Jan 29 15:09:16 2014
From: textexin at xencraft.com (Tex Texin)
Date: Wed, 29 Jan 2014 13:09:16 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@ma
 il.gmail.com> 
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
Message-ID: <00df01cf1d36$70f30d10$52d92730$@com>

Since it isn?t cp1252 nor iso8859, perhaps call it whatwg-latin or whatwg-1.

If, or when, 1252 is updated to assign a character to an undefined
codepoint, it will be problematic to have them both refer to 1252.

For example, if a new currency symbol is added in Latin America, as has been
discussed from time to time.

 
Anyone writing decoders for the Whatwg encoding should also be on notice
that it is not necessarily a superset of 1252 going forward, and should
design for the potential distinction down the road.

 
I am tempted to suggest we call it ?Whatwg-Not-your-fathers-1252? which also
would serve appropriate notice?

tex

 
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Buck Golemon
Sent: Wednesday, January 29, 2014 11:57 AM
To: Anne van Kesteren
Cc: unicode; unicode at norbertlindenberg.com; J?rg Knappen; Fr?d?ric
Grosshans; Markus Scherer
Subject: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice"

 
Anne: Given that the intent is to implement exactly the whatwg spec, and the
group is currently called "whatwg" (even though it may eventually become a
historical artifact), is "whatwg-1252" most appropriate?

 
Norbert Lindenberg previously suggested standardizing some kind of
disambiguation.

http://www.unicode.org/mail-arch/unicode-ml/y2012-m12/0022.html

 
Do you most prefer the s/web-/cp/ pattern?

 
On Wed, Jan 29, 2014 at 11:53 AM, Anne van Kesteren <annevk at annevk.nl>
wrote:

On Wed, Jan 29, 2014 at 11:22 AM, Markus Scherer <markus.icu at gmail.com>
wrote:
> On Wed, Jan 29, 2014 at 10:21 AM, Buck Golemon <buck at yelp.com> wrote:
>> I've been considering naming it cp1252-whatwg.
>
> It would be nicer to put the organization name first, such as
whatwg-cp1252
> or maybe better html-cp1252. That would be more like ibm-932 and such.

If you want to support more encodings than
http://encoding.spec.whatwg.org/ defines I suggest using the prefix
"web-". The organization may change and this is not tied to HTML.


--
http://annevankesteren.nl/

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/8cdc4551/attachment.html>

From prosfilaes at gmail.com  Wed Jan 29 15:45:09 2014
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 29 Jan 2014 13:45:09 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <00df01cf1d36$70f30d10$52d92730$@com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
 <00df01cf1d36$70f30d10$52d92730$@com>
Message-ID: <CAMZ=zj4-GGO-nf+gmBTvMaM034ha=PUtha_VXH0FZseBEUSf5w@mail.gmail.com>

On Wed, Jan 29, 2014 at 1:09 PM, Tex Texin <textexin at xencraft.com> wrote:
> If, or when, 1252 is updated to assign a character to an undefined
> codepoint, it will be problematic to have them both refer to 1252.
>
> For example, if a new currency symbol is added in Latin America, as has been
> discussed from time to time.
>
>
>
> Anyone writing decoders for the Whatwg encoding should also be on notice
> that it is not necessarily a superset of 1252 going forward, and should
> design for the potential distinction down the road.

I don't believe there's any chance that CP-1252 is going to get new
changes. Unicode is king and the value for Microsoft of patching all
the supported Windows editions versus just telling people to use
Unicode is minimal. In any case, Microsoft has to interact with the
Whatwg definition of Latin-1/CP-1252 just as much as anyone else.

-- 
Kie ekzistas vivo, ekzistas espero.


From buck at yelp.com  Wed Jan 29 17:17:32 2014
From: buck at yelp.com (Buck Golemon)
Date: Wed, 29 Jan 2014 15:17:32 -0800
Subject: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CAMZ=zj4-GGO-nf+gmBTvMaM034ha=PUtha_VXH0FZseBEUSf5w@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
 <00df01cf1d36$70f30d10$52d92730$@com>
 <CAMZ=zj4-GGO-nf+gmBTvMaM034ha=PUtha_VXH0FZseBEUSf5w@mail.gmail.com>
Message-ID: <CANDQx1rYGFP0O3twnXZJGDQopgUDeRGKweWrY83AZ3zngmQo6Q@mail.gmail.com>

On Wed, Jan 29, 2014 at 1:45 PM, David Starner <prosfilaes at gmail.com> wrote:

> On Wed, Jan 29, 2014 at 1:09 PM, Tex Texin <textexin at xencraft.com> wrote:
> > If, or when, 1252 is updated to assign a character to an undefined
> > codepoint, it will be problematic to have them both refer to 1252.
> >
> > For example, if a new currency symbol is added in Latin America, as has
> been
> > discussed from time to time.
> >
> >
> >
> > Anyone writing decoders for the Whatwg encoding should also be on notice
> > that it is not necessarily a superset of 1252 going forward, and should
> > design for the potential distinction down the road.
>
> I don't believe there's any chance that CP-1252 is going to get new
> changes. Unicode is king and the value for Microsoft of patching all
> the supported Windows editions versus just telling people to use
> Unicode is minimal. In any case, Microsoft has to interact with the
> Whatwg definition of Latin-1/CP-1252 just as much as anyone else.
>
>

Shawn Steele, the cp1252 owner said:

Our legacy code pages aren't going to change.  We won't add more characters
> to 1252.


 http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0202.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140129/2b23581b/attachment.html>

From jknappen at web.de  Thu Jan 30 02:21:46 2014
From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=)
Date: Thu, 30 Jan 2014 09:21:46 +0100 (CET)
Subject: Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8 twice"
In-Reply-To: <CADnb78j58taYbXbbYs-raxShLJ69dPJtJYeagokxDbyQd4FwNw@mail.gmail.com>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>,
 <CADnb78j58taYbXbbYs-raxShLJ69dPJtJYeagokxDbyQd4FwNw@mail.gmail.com>
Message-ID: <trinity-a3a5e6bf-7bf9-4b13-adf9-9f5dd8702cc2-1391070106446@3capp-webde-bs33>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140130/e038138b/attachment.html>

From buck at yelp.com  Thu Jan 30 12:15:47 2014
From: buck at yelp.com (Buck Golemon)
Date: Thu, 30 Jan 2014 10:15:47 -0800
Subject: Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8
 twice"
In-Reply-To: <trinity-a3a5e6bf-7bf9-4b13-adf9-9f5dd8702cc2-1391070106446@3capp-webde-bs33>
References: <trinity-0572f1ab-7333-465a-bf2f-275393a88e3f-1382949449351@3capp-webde-bs49>
 <20131028113410.wIOIFPvYvi4rX2/05EAm1cSW@dietcurd.local>
 <trinity-c2610e46-b75a-4935-81d7-21984bb9fe79-1382965565882@3capp-webde-bs49>
 <20131028162337.ZLPXM/tUQIjeRcuZdnxGd5T6@dietcurd.local>
 <CAN49p6pSVHmLeeXvSBXA7biMpck0SS26m-MAt=+y2kiA1QMGcg@mail.gmail.com>
 <20131029100950.q993HcL4O6e8kWFFZojh+81p@dietcurd.local>
 <trinity-65ff1aeb-d465-4a26-9164-9568df947870-1383063301198@3capp-webde-bs16>
 <527118E0.90501@gmail.com>
 <trinity-b7a05bc6-633d-42e6-976f-8bfe4a63e7a0-1383146033551@3capp-webde-bs26>
 <52712C94.7040102@gmail.com>
 <trinity-55d066ba-8eda-4966-9c6c-8beacccb3924-1383150733623@3capp-webde-bs26>
 <52713A3D.4090306@gmail.com>
 <CANDQx1rvuFVKHXg0SfzZgF=qTiNfqTKzRO-CksgdZRP1DXD6Bw@mail.gmail.com>
 <trinity-bf3e09ed-9a57-4e5e-8e01-0a63a2c83d17-1391007583446@3capp-webde-bs22>
 <CANDQx1qUJb1MCDU-pS0ROwpe8aNh8ZBhKTwA0YYAmuTD+QkYSg@mail.gmail.com>
 <CAN49p6pXjDM2c6rvme9+0Q30-9Zt=ZCTGHK-CDmxHbLGALPY6g@mail.gmail.com>
 <CADnb78i5787v08-PODpz8wyjn-8=6T9eZXnMjQdQozKyjMxr+Q@mail.gmail.com>
 <CANDQx1p5kgZCrGgwbbPhN_K7xX=Eyy5TTho81ewxG5FZ9GuBnw@mail.gmail.com>
 <CADnb78j58taYbXbbYs-raxShLJ69dPJtJYeagokxDbyQd4FwNw@mail.gmail.com>
 <trinity-a3a5e6bf-7bf9-4b13-adf9-9f5dd8702cc2-1391070106446@3capp-webde-bs33>
Message-ID: <CANDQx1qrEtWG49HkCuMdg4q4W7MKEoYhK7OQstNbg3z=G918nw@mail.gmail.com>

While I understand your argument, my intent was to suggest that
"mysql-latin1" was *not* as good as some other name. Surely you're not
arguing that all names are equivalently good. Obviously "mnmmmnmn" is a
worse name than "mysql-latin1".

"Mysql" has less to do with the issue than "whatwg" or "web", since this
codec is necessary any time you want to reproduce browser decoding,
regardless of whether mysql is involved. I contend that mysql adopted this
implementation because it is so popularly used for web applications.

"latin1" is less directly accurate than "cp1252". While whatwg requires
that latin1 be an alias of cp1252, it does the same for ascii, and it
maintains that the canonical name is "windows-1252".

Ideally you'd want to update the name of your project, but if not, that's
your preference :)

However if I can get some consensus on a least-bad name ("web-cp1252" with
alias "web-windows-1252" seems to be in the lead), I plan to release such a
codec.

This issue also extends far beyond python. Any language that deals with the
web (ie all of them) and wants to be able to interpret (legacy) bytes
exactly as a browser would (admittedly a niche, but still important task)
needs such a codec. I believe unicode.org should eventually recognize such
a codec. Ideally it would reflect that this is the most-common
implementation of cp1252, but if I need to use a different name, that's
better than nothing at all.


On Jan 30, 2014 12:31 AM, J?rg Knappen <jknappen at web.de> wrote:

>  When you are looking for a *new* name for that encoding, why don't you
> just adopt the pythonese precedent
> mysql-latin1 ? It is as good or as bad as any other name, but has some
> footing just now.
>
> --J?rg Knappen
>
> *Gesendet:* Mittwoch, 29. Januar 2014 um 21:12 Uhr
> *Von:* "Anne van Kesteren" <annevk at annevk.nl>
> *An:* "Buck Golemon" <buck at yelp.com>
> *Cc:* "Markus Scherer" <markus.icu at gmail.com>, "J?rg Knappen" <
> jknappen at web.de>, "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>,
> unicode <unicode at unicode.org>, unicode at norbertlindenberg.com
> *Betreff:* Re: Re: Re: Re: Re: Re: Do you know a tool to decode "UTF-8
> twice"
> On Wed, Jan 29, 2014 at 11:57 AM, Buck Golemon <buck at yelp.com> wrote:
> > Anne: Given that the intent is to implement exactly the whatwg spec, and
> the
> > group is currently called "whatwg" (even though it may eventually become
> a
> > historical artifact), is "whatwg-1252" most appropriate?
>
> It's up to you I suppose, but "whatwg-1252" just seems like long term
> it will lose its meaning. For the web "windows-1252" will always have
> this meaning due to deployed content, so "web-windows-1252" if you
> need to disambiguate from a different implementation of windows-1252
> makes sense to me.
>
>
> --
> http://annevankesteren.nl/
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140130/2e927f20/attachment.html>