From doug at ewellic.org  Fri Jun  3 16:40:35 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 03 Jun 2016 14:40:35 -0700
Subject: Encoding the Mayan Script:
Message-ID: <20160603144035.665a7a7059d7ee80bb4d670165c8327d.79252965a8.wbe@email03.godaddy.com>

http://blog.unicode.org/2016/06/encoding-mayan-script-your-adopt.html

This is great news. Congratulations to both UTC and the sponsors for
helping to fund this worthwhile encoding effort.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????
 

From steffen at sdaoden.eu  Sat Jun  4 06:08:31 2016
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Sat, 04 Jun 2016 13:08:31 +0200
Subject: Encoding the Mayan Script:
In-Reply-To: <20160603144035.665a7a7059d7ee80bb4d670165c8327d.79252965a8.wbe@email03.godaddy.com>
References: <20160603144035.665a7a7059d7ee80bb4d670165c8327d.79252965a8.wbe@email03.godaddy.com>
Message-ID: <20160604110831.g5-Yi-glJ%steffen@sdaoden.eu>

 |http://blog.unicode.org/2016/06/encoding-mayan-script-your-adopt.html
 |
 |This is great news. Congratulations to both UTC and the sponsors for
 |helping to fund this worthwhile encoding effort.

I concur with all my heart!

  Are usch? ocher z?ch war?l K'itsch?' ub'?'.

Good luck!

  Are uz?choschik wa'e:
    k'? kaz'ininoq, k'? katschamamoq, kaz'inonik,
      k'? kasilanik, k'? kalolonik, kat?lona putsch upa k?ch.

May the force be with you!

--steffen


From mathias at qiwi.be  Mon Jun  6 02:58:37 2016
From: mathias at qiwi.be (Mathias Bynens)
Date: Mon, 6 Jun 2016 10:58:37 +0300
Subject: UAX44: loose matching of symbolic values and the `is` prefix
Message-ID: <EA7FFB6E-083B-4A01-A1CD-00B69B9A3CC6@qiwi.be>

http://unicode.org/reports/tr44/#UAX44-LM3 mentions the `is` prefix:

> For loose matching of symbolic values, an initial prefix string "is" is ignored. [?] Ignoring any initial "is" on a symbolic value during loose matching is likely to produce the best results in application areas such as regex. Removal of an initial "is" string for a loose matching comparison only needs to be done once for a symbolic value, and need not be tested recursively. There are no property aliases or property value aliases of the form "isisisisistooconvoluted" defined just to test implementation edge cases.

UAX44 provides the reason for the existence of this ?feature?:

> The reason for this is that APIs returning property values are often named using the convention of prefixing "is" (or "Is" or "Is_", and so forth) to a property value.

That seems like a rather weak argument. Specifically applying this to UTS18 (Unicode regular expressions):

> "Script=Greek" is equivalent to "Script=isGreek" or "Script=Is_Greek"

If there is already a way to match all symbols in the Greek script (not counting the use of aliases and other loose matching requirements), i.e. `Script=Greek` ? what good does it do to add support for yet another one?

Looking at implementations in the wild, Steven Levithan found (https://github.com/mathiasbynens/es-unicode-regexp-proposal/issues/2#issuecomment-143288062) that some regex flavors use `Is` for scripts, some for blocks, some for scripts and blocks, some for neither. Since some script and block names collide, this causes problems, especially when porting regexes across flavors.

The `is` prefix doesn?t provide any functionality that would otherwise be unavailable. It doesn?t add any value, yet causes incompatibility, author confusion, and it increases implementation complexity. UAX 44 includes two entire paragraphs pointing out that last part:

> Removal of an initial "is" string for a loose matching comparison only needs to be done once for a symbolic value, and need not be tested recursively. There are no property aliases or property value aliases of the form "isisisisistooconvoluted" defined just to test implementation edge cases.
> 
> Existing and future property aliases and property value aliases are guaranteed to be unique within their relevant namespaces, even if an initial prefix string "is" is ignored. The existing cases of note for aliases that do start with "is" are: dt=Iso (Decomposition_Type=Isolated) and lb=IS. The Decomposition_Type value alias does not cause any problem, because there is no contrasting value alias dt=o (Decomposition_Type=olated). For lb=IS, note that the "IS" is the entire property value alias, and is not a prefix. There is no null value for the Line_Break property for it to contrast with, but implementations of loose matching should be careful of this edge case, so that "lb=IS" is not misinterpreted as matching a null value.


Backwards compatibility seems to be the only good reason to continue supporting the `is` prefix *for existing implementations*, such as the one in Perl. But why is it still a requirement for new engines to support it as part of UAX44-LM3?

I?d like to propose changing UAX44-LM3 to make supporting the `is` prefix optional for new implementations.


From sisrivas at blueyonder.co.uk  Mon Jun  6 04:11:15 2016
From: sisrivas at blueyonder.co.uk (srivas sinnathurai)
Date: Mon, 6 Jun 2016 10:11:15 +0100 (BST)
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <EA7FFB6E-083B-4A01-A1CD-00B69B9A3CC6@qiwi.be>
References: <EA7FFB6E-083B-4A01-A1CD-00B69B9A3CC6@qiwi.be>
Message-ID: <398429781.2108169.1465204275981.JavaMail.open-xchange@oxbe16.tb.ukmail.iss.as9143.net>

Thanks Ashley.

> 
>     On 06 June 2016 at 08:58 Mathias Bynens <mathias at qiwi.be> wrote:
> 
> 
>     http://unicode.org/reports/tr44/#UAX44-LM3 mentions the `is` prefix:
> 
>     > For loose matching of symbolic values, an initial prefix string "is" is
>     > ignored. [?] Ignoring any initial "is" on a symbolic value during loose
>     > matching is likely to produce the best results in application areas such
>     > as regex. Removal of an initial "is" string for a loose matching
>     > comparison only needs to be done once for a symbolic value, and need not
>     > be tested recursively. There are no property aliases or property value
>     > aliases of the form "isisisisistooconvoluted" defined just to test
>     > implementation edge cases.
> 
>     UAX44 provides the reason for the existence of this ?feature?:
> 
>     > The reason for this is that APIs returning property values are often
>     > named using the convention of prefixing "is" (or "Is" or "Is_", and so
>     > forth) to a property value.
> 
>     That seems like a rather weak argument. Specifically applying this to
> UTS18 (Unicode regular expressions):
> 
>     > "Script=Greek" is equivalent to "Script=isGreek" or "Script=Is_Greek"
> 
>     If there is already a way to match all symbols in the Greek script (not
> counting the use of aliases and other loose matching requirements), i.e.
> `Script=Greek` ? what good does it do to add support for yet another one?
> 
>     Looking at implementations in the wild, Steven Levithan found
> (https://github.com/mathiasbynens/es-unicode-regexp-proposal/issues/2#issuecomment-143288062)
> that some regex flavors use `Is` for scripts, some for blocks, some for
> scripts and blocks, some for neither. Since some script and block names
> collide, this causes problems, especially when porting regexes across flavors.
> 
>     The `is` prefix doesn?t provide any functionality that would otherwise be
> unavailable. It doesn?t add any value, yet causes incompatibility, author
> confusion, and it increases implementation complexity. UAX 44 includes two
> entire paragraphs pointing out that last part:
> 
>     > Removal of an initial "is" string for a loose matching comparison only
>     > needs to be done once for a symbolic value, and need not be tested
>     > recursively. There are no property aliases or property value aliases of
>     > the form "isisisisistooconvoluted" defined just to test implementation
>     > edge cases.
>     >
>     > Existing and future property aliases and property value aliases are
>     > guaranteed to be unique within their relevant namespaces, even if an
>     > initial prefix string "is" is ignored. The existing cases of note for
>     > aliases that do start with "is" are: dt=Iso
>     > (Decomposition_Type=Isolated) and lb=IS. The Decomposition_Type value
>     > alias does not cause any problem, because there is no contrasting value
>     > alias dt=o (Decomposition_Type=olated). For lb=IS, note that the "IS" is
>     > the entire property value alias, and is not a prefix. There is no null
>     > value for the Line_Break property for it to contrast with, but
>     > implementations of loose matching should be careful of this edge case,
>     > so that "lb=IS" is not misinterpreted as matching a null value.
> 
> 
>     Backwards compatibility seems to be the only good reason to continue
> supporting the `is` prefix *for existing implementations*, such as the one in
> Perl. But why is it still a requirement for new engines to support it as part
> of UAX44-LM3?
> 
>     I?d like to propose changing UAX44-LM3 to make supporting the `is` prefix
> optional for new implementations.
> 
> 

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/f89d1335/attachment.html>

From kenwhistler at att.net  Mon Jun  6 10:04:36 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Mon, 6 Jun 2016 08:04:36 -0700
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <EA7FFB6E-083B-4A01-A1CD-00B69B9A3CC6@qiwi.be>
References: <EA7FFB6E-083B-4A01-A1CD-00B69B9A3CC6@qiwi.be>
Message-ID: <cd2c0e5d-8239-01b0-9e64-f5ddd4912926@att.net>


On 6/6/2016 12:58 AM, Mathias Bynens wrote:
> Backwards compatibility seems to be the only good reason to continue supporting the `is` prefix*for existing implementations*, such as the one in Perl. But why is it still a requirement for new engines to support it as part of UAX44-LM3?
>
> I?d like to propose changing UAX44-LM3 to make supporting the `is` prefix optional for new implementations.
>

I think the target of concern here is wrong. UAX #44 doesn't *require* 
any regex engine to include this "is prefix" handling. What UAX #44 does 
is recommend that all property and property value aliases be correctly 
recognized, and then specifies a clear statement (in UAX44-LM3) of the 
loose matching rule for recognizing the various forms of those aliases 
that could be considered equivalent. I don't think messing with that 
rule statement (which has been in place since 2010) would be helpful.

The target instead should be in UTS #18, which happily, has a proposed 
update available for comment right now:

http://www.unicode.org/review/pri325/

The relevant point is:

http://www.unicode.org/reports/tr18/tr18-18.html#RL1.2

That is the conformance part that requires that conformant Unicode regex 
implementations "must follow the Matching rules from [UAX44]".

If you are seeking indulgences for new engine implementations, that 
seems like the correct point to be adding clarifications and exceptions. 
Note that the following text in that section already includes wording 
about exceptions and compatibility issues. There is also a following 
section specifically about regex for the Script and Script Extensions 
properties that seems like it would be the appropriate place to talk 
about the Greek/IsGreek issue as pertains to regex support.

I would suggest you make specific suggestions about the text of UTS #18 
as part of the ongoing public review for the proposed update of that 
specification.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/5e35f117/attachment.html>

From mathias at qiwi.be  Mon Jun  6 10:25:12 2016
From: mathias at qiwi.be (Mathias Bynens)
Date: Mon, 6 Jun 2016 18:25:12 +0300
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <cd2c0e5d-8239-01b0-9e64-f5ddd4912926@att.net>
References: <EA7FFB6E-083B-4A01-A1CD-00B69B9A3CC6@qiwi.be>
 <cd2c0e5d-8239-01b0-9e64-f5ddd4912926@att.net>
Message-ID: <9A8F2EF5-5716-442A-8DB9-A50E6C69E98D@qiwi.be>


> On 6 Jun 2016, at 18:04, Ken Whistler <kenwhistler at att.net> wrote:
> 
> UAX #44 doesn't *require* any regex engine to include this "is prefix" handling.

Are you referring to the fact that the first paragraph on  http://unicode.org/reports/tr44/#Matching_Rules uses ?strongly recommended? and ?should? instead of ?required? and ?must??

> What UAX #44 does is recommend that all property and property value aliases be correctly recognized, and then specifies a clear statement (in UAX44-LM3) of the loose matching rule for recognizing the various forms of those aliases that could be considered equivalent. I don't think messing with that rule statement (which has been in place since 2010) would be helpful.

Why not? What I had in mind was adding a small sentence like:

> For compatibility reasons, implementations may optionally support any initial prefix string "is".

This wouldn?t be a breaking change in any way, and it would enable new implementations that aim to follow UAX44 to do so without having to support `is`, and it would solve the problem everywhere the matching rules get applied rather than just for regular expressions.

> I think the target of concern here is wrong. 

Not sure I agree. It seems to me the `is` prefix is problematic (for the same reasons) wherever it?s used, whether that?s in regular expressions or not.

> The target instead should be in UTS #18, which happily, has a proposed update available for comment right now:
> 
> http://www.unicode.org/review/pri325/
> 
> The relevant point is:
> 
> http://www.unicode.org/reports/tr18/tr18-18.html#RL1.2
> 
> That is the conformance part that requires that conformant Unicode regex implementations "must follow the Matching rules from [UAX44]".

Thanks for the pointer! I will submit my feedback there as well. It seems more awkward / difficult to add an exception there rather than just slightly tweaking the UAX44-LM3 text as suggested above, though.

From doug at ewellic.org  Mon Jun  6 10:32:19 2016
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 06 Jun 2016 08:32:19 -0700
Subject: UAX44: loose matching of symbolic values and the `is` prefix
Message-ID: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>

Mathias Bynens wrote:

> Looking at implementations in the wild, Steven Levithan found
> (https://github.com/mathiasbynens/es-unicode-regexp-proposal/issues/2#issuecomment-143288062)
> that some regex flavors use `Is` for scripts, some for blocks, some
> for scripts and blocks, some for neither. Since some script and block
> names collide, this causes problems, especially when porting regexes
> across flavors. 

Are script names and block names expected to share a common namespace?
If they don't, then there is no collision.

LM3 says to ignore initial (and non-final) "is" for all property aliases
and property value aliases, not just Script and Block values. There will
be a lot of "collisions" if you take all of those into consideration.

> The `is` prefix doesn?t provide any functionality that would otherwise
> be unavailable. It doesn?t add any value, yet causes incompatibility,
> author confusion, and it increases implementation complexity.

I don't see any evidence that it adds no value. Support for existing
implementations is value.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From mathias at qiwi.be  Mon Jun  6 10:40:45 2016
From: mathias at qiwi.be (Mathias Bynens)
Date: Mon, 6 Jun 2016 18:40:45 +0300
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
References: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
Message-ID: <65A2B7E5-B4D5-4605-B828-AC41BE7F540B@qiwi.be>

> 
>> The `is` prefix doesn?t provide any functionality that would otherwise
>> be unavailable. It doesn?t add any value, yet causes incompatibility,
>> author confusion, and it increases implementation complexity.
> 
> I don't see any evidence that it adds no value. Support for existing
> implementations is value.

It adds no value because it doesn?t enable any new functionality.
I agree support for existing implementations would have some value, but given that existing implementations disagree on the properties for which they support `is` that is not going to happen anyway. It?s impossible to be compatible with all those different implementations at the same time.

From markus.icu at gmail.com  Mon Jun  6 11:09:11 2016
From: markus.icu at gmail.com (Markus Scherer)
Date: Mon, 6 Jun 2016 09:09:11 -0700
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <65A2B7E5-B4D5-4605-B828-AC41BE7F540B@qiwi.be>
References: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
 <65A2B7E5-B4D5-4605-B828-AC41BE7F540B@qiwi.be>
Message-ID: <CAN49p6qKF7W5yX_aWvxe0dZchhPNuW0KPWsVv4gtUQnPvLZazg@mail.gmail.com>

Interesting discussion!

ICU does not support "is" nor "in" prefixes. I wasn't even aware that UAX
#44 loose matching prescribes "is". ICU just implements what
Property[Value]Aliases.txt say:

# Loose matching should be applied to all property names and property
values, with
# the exception of String Property values. With loose matching of
property names and
# values, the case distinctions, whitespace, hyphens, and '_' are ignored.


The prefixes seem gratuitous and confusing. For example, if I
read UAX44-LM3 right, it would allow [:isscript=isgreek:].

We do support just [:Greek:] for scripts and [:L:] for general categories.

I would rather not add support for the prefixes in ICU.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/6681f98f/attachment.html>

From asmusf at ix.netcom.com  Mon Jun  6 11:48:27 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Mon, 6 Jun 2016 09:48:27 -0700
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <CAN49p6qKF7W5yX_aWvxe0dZchhPNuW0KPWsVv4gtUQnPvLZazg@mail.gmail.com>
References: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
 <65A2B7E5-B4D5-4605-B828-AC41BE7F540B@qiwi.be>
 <CAN49p6qKF7W5yX_aWvxe0dZchhPNuW0KPWsVv4gtUQnPvLZazg@mail.gmail.com>
Message-ID: <8bb32d92-dc1e-c54a-8fb0-b25fe40f05fe@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/a9859018/attachment.html>

From patch.nova at gmail.com  Mon Jun  6 16:39:05 2016
From: patch.nova at gmail.com (Nova Patch)
Date: Mon, 6 Jun 2016 17:39:05 -0400
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
References: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
Message-ID: <CANV8b5YZJV8U-525MYDktninFgBoBxwVkOp2dTSi1d3tE9FO8w@mail.gmail.com>

Den mandag 6. juni 2016 skrev Doug Ewell f?lgende:
>
> Mathias Bynens wrote:
>
> > The `is` prefix doesn?t provide any functionality that would otherwise
> > be unavailable. It doesn?t add any value, yet causes incompatibility,
> > author confusion, and it increases implementation complexity.
>
> I don't see any evidence that it adds no value. Support for existing
> implementations is value.

Markus has now confirmed that ICU doesn?t support this syntax and I can
confirm that even Perl, which probably supports the most different ways to
write the same regex, doesn?t support any form of the `is` prefix for
property values when the property name is provided.

$ perl -Mutf8 -E 'say "?" =~ /\p{Script=Greek}/'
1
$ perl -Mutf8 -E 'say "?" =~ /\p{Script=IsGreek}/'
Can't find Unicode property definition "Script=IsGreek" at -e line 1.
$ perl -Mutf8 -E 'say "?" =~ /\p{Script=Is_Greek}/'
Can't find Unicode property definition "Script=Is_Greek" at -e line 1.

Although Perl does optionally support the `is` prefix for property names
and standalone property values:

$ perl -Mutf8 -E 'say "?" =~ /\p{IsScript=Greek}/'
1
$ perl -Mutf8 -E 'say "?" =~ /\p{IsGreek}/'
1

However, this syntax is notoriously inconstant among different regex
engines. Perl?s specific rules are documented in *perluniprops* (
http://perldoc.perl.org/perluniprops.html) as \p{Is_*} (case- and
underscore-insensitive) being a synonym for \p{*} which explains the above
functionality. Based on my past research for *Unicode Regular Expression
Engines* at IUC38, I suspect that there might not be any regex engine that
actually supports syntax like Script=IsGreek as described in UAX44-LM3! If
anybody knows otherwise, I?d love to hear about it.

Nova
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/e86921b7/attachment.html>

From oren.watson at gmail.com  Mon Jun  6 16:48:40 2016
From: oren.watson at gmail.com (Oren Watson)
Date: Mon, 6 Jun 2016 17:48:40 -0400
Subject: 72 New Emoji Characters
In-Reply-To: <5755CCBA.2000701@unicode.org>
References: <5755CCBA.2000701@unicode.org>
Message-ID: <CAKs2F=pSjoVmcOkVrn1PszMsL7NqKszDQnmg766Z1UWhuUhpDQ@mail.gmail.com>

I see this in the list of new emoji:
   GOAL NET
? marksmanship, sport shooting, hunting
 This is incorrect, a goal net would be for football or hockey, not
marksmanship.

On Mon, Jun 6, 2016 at 3:19 PM, <announcements at unicode.org> wrote:

> [image: [Emoji Image]]The 72 new emoji characters for Unicode 9.0 are now
> final, and listed in Emoji Recently Added
> <http://www.unicode.org/emoji/charts/emoji-released.html>. They include 7
> faces, 7 people, 7 hand gestures, 14 plants/animals, 18 food emoji, 12
> sports emoji, and a few others. The corresponding documentation in *UTR
> #51 Unicode Emoji, Version 3.0 <http://www.unicode.org/reports/tr51/>*
> has also been updated, with additional guidelines for implementers and the
> new versions of the emoji data files. These should appear on smart phones
> and other devices that support emoji once vendors have a chance to update
> them.
>
> Four of the new emoji are added to complete gender pairs. Work has already
> begun on the Version 4.0 of Unicode Emoji, with a focus on further
> enhancing gender representation, and targeted to appear in the near future.
>
> The new emoji characters will soon be available for adoption
> <http://unicode.org/consortium/adopt-a-character.html>, helping support projects
> to improve language support
> <http://blog.unicode.org/2016/06/encoding-mayan-script-your-adopt.html>.
>
> http://blog.unicode.org/2016/06/72-new-emoji-characters.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/cabb40f7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: android_1f926.png
Type: image/png
Size: 2329 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20160606/cabb40f7/attachment.png>

From mathias at qiwi.be  Mon Jun  6 22:11:49 2016
From: mathias at qiwi.be (Mathias Bynens)
Date: Tue, 7 Jun 2016 06:11:49 +0300
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <CANV8b5YZJV8U-525MYDktninFgBoBxwVkOp2dTSi1d3tE9FO8w@mail.gmail.com>
References: <20160606083219.665a7a7059d7ee80bb4d670165c8327d.db3c97e525.wbe@email03.godaddy.com>
 <CANV8b5YZJV8U-525MYDktninFgBoBxwVkOp2dTSi1d3tE9FO8w@mail.gmail.com>
Message-ID: <B4EF6546-1534-422F-82A8-BB65F0B0A9FE@qiwi.be>


> On 7 Jun 2016, at 00:39, Nova Patch <patch.nova at gmail.com> wrote:
> 
> [?] Based on my past research for Unicode Regular Expression Engines at IUC38, I suspect that there might not be any regex engine that actually supports syntax like Script=IsGreek as described in UAX44-LM3! If anybody knows otherwise, I?d love to hear about it.

This seems like a cut-and-dried case of reality not matching the specification, which is not helpful in any way. The sensible thing to do is to update the specification accordingly, as proposed.

From doug at ewellic.org  Tue Jun  7 09:56:46 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 07 Jun 2016 07:56:46 -0700
Subject: UAX44: loose matching of symbolic values and the `is` prefix
Message-ID: <20160607075646.665a7a7059d7ee80bb4d670165c8327d.d8a17725c6.wbe@email03.godaddy.com>

Mathias Bynens replied to Nova Patch: 

>> [...] Based on my past research for Unicode Regular Expression
>> Engines at IUC38, I suspect that there might not be any regex engine
>> that actually supports syntax like Script=IsGreek as described in
>> UAX44-LM3! If anybody knows otherwise, I?d love to hear about it. 
>
> This seems like a cut-and-dried case of reality not matching the
> specification, which is not helpful in any way. The sensible thing to
> do is to update the specification accordingly, as proposed. 

Rather than changing the spec based on anecdotal evidence, an even more
sensible thing to do would be to make this a Public Review Issue: "We're
considering simplifying this matching rule and need to know if any
implementers rely on the part we're planning to delete. Please send
feedback by $date."

There must have been some basis for including the "is" case in the first
place. It seems irresponsible to assume now that nobody anywhere needs
it.

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From mathias at qiwi.be  Tue Jun  7 14:13:09 2016
From: mathias at qiwi.be (Mathias Bynens)
Date: Tue, 7 Jun 2016 22:13:09 +0300
Subject: UAX44: loose matching of symbolic values and the `is` prefix
In-Reply-To: <20160607075646.665a7a7059d7ee80bb4d670165c8327d.d8a17725c6.wbe@email03.godaddy.com>
References: <20160607075646.665a7a7059d7ee80bb4d670165c8327d.d8a17725c6.wbe@email03.godaddy.com>
Message-ID: <5194BF5D-4EDD-4D02-87AD-308362B8A800@qiwi.be>


> On 7 Jun 2016, at 17:56, Doug Ewell <doug at ewellic.org> wrote:
> 
> Rather than changing the spec based on anecdotal evidence, [?]
> 
> It seems irresponsible to assume now that nobody anywhere needs
> it.

What assumption are you talking about? Markus and Nova provided actual examples of implementations not following the spec, and so far no one has been able to provide even a single counter-example.

> There must have been some basis for including the "is" case in the first
> place.

Now *that* sounds like an assumption to me.

From doug at ewellic.org  Tue Jun  7 14:51:57 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 07 Jun 2016 12:51:57 -0700
Subject: UAX44: loose matching of symbolic values and the `is` prefix
Message-ID: <20160607125156.665a7a7059d7ee80bb4d670165c8327d.b4f2f270ec.wbe@email03.godaddy.com>

Mathias Bynens wrote:

>> Rather than changing the spec based on anecdotal evidence, [...]
>>
>> It seems irresponsible to assume now that nobody anywhere needs
>> it.
>
> What assumption are you talking about? Markus and Nova provided actual
> examples of implementations not following the spec, and so far no one
> has been able to provide even a single counter-example.

I read the synopsis of Nova's IUC38 presentation, and it looks like he
did some pretty thorough research into regex engines, so I take back the
phrase "based on anecdotal evidence."

Changes to a Unicode specification that would have the effect of
removing functionality normally trigger a public review. They help tease
out the edge cases better than a mailing list discussion. The UTC has
done well to make frequent use of this mechanism when potentially
breaking changes are being considered.

>> There must have been some basis for including the "is" case in the
>> first place.
>
> Now *that* sounds like an assumption to me.

Do you suppose they just made it up out of whole cloth?

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From public at khwilliamson.com  Tue Jun  7 15:48:10 2016
From: public at khwilliamson.com (Karl Williamson)
Date: Tue, 7 Jun 2016 14:48:10 -0600
Subject: Adopting ZWJ
Message-ID: <5757330A.3000109@khwilliamson.com>

I heard that someone was considering adopting ZWJ.  They seemed to think 
that non-printables are not adoptable.  But I was unable to find a clear 
list of criteria.  The page that allows one to adopt said that it wasn't 
available, but that page really doesn't make it clear how one can test 
for this without actually doing the adoption.  (Since it doesn't 
actually ask for your credit card number on the initial page, one can 
back out before the final commitment, but that's not a very friendly 
interface)

From public at khwilliamson.com  Tue Jun  7 15:52:36 2016
From: public at khwilliamson.com (Karl Williamson)
Date: Tue, 7 Jun 2016 14:52:36 -0600
Subject: Adopting ZWJ
In-Reply-To: <5757330A.3000109@khwilliamson.com>
References: <5757330A.3000109@khwilliamson.com>
Message-ID: <57573414.30301@khwilliamson.com>

On 06/07/2016 02:48 PM, Karl Williamson wrote:
> I heard that someone was considering adopting ZWJ.  They seemed to think
> that non-printables are not adoptable.  But I was unable to find a clear
> list of criteria.  The page that allows one to adopt said that it wasn't
> available, but that page really doesn't make it clear how one can test
> for this without actually doing the adoption.  (Since it doesn't
> actually ask for your credit card number on the initial page, one can
> back out before the final commitment, but that's not a very friendly
> interface)
>

After I wrote that, I found this that I previously overlooked

"You can?t sponsor candidate characters (those not yet released in a 
version of Unicode, such as the Emoji Candidates), nor certain 
characters such as invisible ones."

But why this rule.  Why should someone be forbidden to adopt ZWJ?


From charupdate at orange.fr  Tue Jun  7 19:25:13 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 8 Jun 2016 02:25:13 +0200 (CEST)
Subject: Adopting ZWJ
In-Reply-To: <57573414.30301@khwilliamson.com>
References: <5757330A.3000109@khwilliamson.com>
 <57573414.30301@khwilliamson.com>
Message-ID: <903819309.27994.1465345513630.JavaMail.www@wwinf1c20>

On Tue, 7 Jun 2016 14:52:36 -0600, Karl Williamson wrote:

> On 06/07/2016 02:48 PM, Karl Williamson wrote:
> > I heard that someone was considering adopting ZWJ. They seemed to think
> > that non-printables are not adoptable. But I was unable to find a clear
> > list of criteria. The page that allows one to adopt said that it wasn't
> > available, but that page really doesn't make it clear how one can test
> > for this without actually doing the adoption. (Since it doesn't
> > actually ask for your credit card number on the initial page, one can
> > back out before the final commitment, but that's not a very friendly
> > interface)
> >
> 
> After I wrote that, I found this that I previously overlooked
> 
> "You can?t sponsor candidate characters (those not yet released in a
> version of Unicode, such as the Emoji Candidates), nor certain
> characters such as invisible ones."
> 
> But why this rule. Why should someone be forbidden to adopt ZWJ?

Likewise I seriously considered adopting NNBSP, that is very important
as a layout control, e.g. in the fr-FR locale, and is almost always stable
in the applications, as opposed to NBSP. Indeed neither do I?see any 
reason not to be able to adopt these characters, the less as there *is*
a visible representation, displaying their abbreviation in a box.

However I?was aware from the beginning that my desire was unconventional.
At least it isn?t the kind of ideal gift for your niece as referred to on
http://www.unicode.org/consortium/adopt-a-character.html


From public at khwilliamson.com  Tue Jun  7 21:39:07 2016
From: public at khwilliamson.com (Karl Williamson)
Date: Tue, 7 Jun 2016 20:39:07 -0600
Subject: Adopting ZWJ
In-Reply-To: <903819309.27994.1465345513630.JavaMail.www@wwinf1c20>
References: <5757330A.3000109@khwilliamson.com>
 <57573414.30301@khwilliamson.com>
 <903819309.27994.1465345513630.JavaMail.www@wwinf1c20>
Message-ID: <5757854B.6030902@khwilliamson.com>

On 06/07/2016 06:25 PM, Marcel Schneider wrote:
> On Tue, 7 Jun 2016 14:52:36 -0600, Karl Williamson wrote:
>
>> On 06/07/2016 02:48 PM, Karl Williamson wrote:
>>> I heard that someone was considering adopting ZWJ. They seemed to think
>>> that non-printables are not adoptable. But I was unable to find a clear
>>> list of criteria. The page that allows one to adopt said that it wasn't
>>> available, but that page really doesn't make it clear how one can test
>>> for this without actually doing the adoption. (Since it doesn't
>>> actually ask for your credit card number on the initial page, one can
>>> back out before the final commitment, but that's not a very friendly
>>> interface)
>>>
>>
>> After I wrote that, I found this that I previously overlooked
>>
>> "You can?t sponsor candidate characters (those not yet released in a
>> version of Unicode, such as the Emoji Candidates), nor certain
>> characters such as invisible ones."
>>
>> But why this rule. Why should someone be forbidden to adopt ZWJ?
>
> Likewise I seriously considered adopting NNBSP, that is very important
> as a layout control, e.g. in the fr-FR locale, and is almost always stable
> in the applications, as opposed to NBSP. Indeed neither do I see any
> reason not to be able to adopt these characters, the less as there *is*
> a visible representation, displaying their abbreviation in a box.
>
> However I was aware from the beginning that my desire was unconventional.
> At least it isn?t the kind of ideal gift for your niece as referred to on
> http://www.unicode.org/consortium/adopt-a-character.html
>

Actually, someone suggested to me, only partially tongue-in-cheek that 
Unicode pitch to Sesame Street 
(https://en.wikipedia.org/wiki/Sesame_Street) that they adopt some 
letters, as the show often (used to anyway) say that this episode is 
brought to you by the letters Q and x (different letters sponsored 
different episodes).  Or maybe the pitch could be to the uncles and 
aunts, "Now you can be like Sesame Street, and sponsor a letter."


From charupdate at orange.fr  Wed Jun  8 00:03:14 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Wed, 8 Jun 2016 07:03:14 +0200 (CEST)
Subject: Adopting ZWJ
In-Reply-To: <5757854B.6030902@khwilliamson.com>
References: <5757330A.3000109@khwilliamson.com>
 <57573414.30301@khwilliamson.com>
 <903819309.27994.1465345513630.JavaMail.www@wwinf1c20>
 <5757854B.6030902@khwilliamson.com>
Message-ID: <1017162842.394.1465362194446.JavaMail.www@wwinf1j18>

On Tue, 7 Jun 2016 20:39:07 -0600, Karl Williamson wrote:
>On 06/07/2016 06:25 PM, Marcel Schneider wrote:
>> On Tue, 7 Jun 2016 14:52:36 -0600, Karl Williamson wrote:
>>
>>> On 06/07/2016 02:48 PM, Karl Williamson wrote:
>>>> I heard that someone was considering adopting ZWJ. They seemed to think
>>>> that non-printables are not adoptable. But I was unable to find a clear
>>>> list of criteria. The page that allows one to adopt said that it wasn't
>>>> available, but that page really doesn't make it clear how one can test
>>>> for this without actually doing the adoption. (Since it doesn't
>>>> actually ask for your credit card number on the initial page, one can
>>>> back out before the final commitment, but that's not a very friendly
>>>> interface)
>>>>
>>>
>>> After I wrote that, I found this that I previously overlooked
>>>
>>> "You can?t sponsor candidate characters (those not yet released in a
>>> version of Unicode, such as the Emoji Candidates), nor certain
>>> characters such as invisible ones."
>>>
>>> But why this rule. Why should someone be forbidden to adopt ZWJ?
>>
>> Likewise I seriously considered adopting NNBSP, that is very important
>> as a layout control, e.g. in the fr-FR locale, and is almost always stable
>> in the applications, as opposed to NBSP. Indeed neither do I see any
>> reason not to be able to adopt these characters, the less as there *is*
>> a visible representation, displaying their abbreviation in a box.
>>
>> However I was aware from the beginning that my desire was unconventional.
>> At least it isn?t the kind of ideal gift for your niece as referred to on
>> http://www.unicode.org/consortium/adopt-a-character.html
>>
>
> Actually, someone suggested to me, only partially tongue-in-cheek that
> Unicode pitch to Sesame Street
> (https://en.wikipedia.org/wiki/Sesame_Street) that they adopt some
> letters, as the show often (used to anyway) say that this episode is
> brought to you by the letters Q and x (different letters sponsored
> different episodes). Or maybe the pitch could be to the uncles and
> aunts, "Now you can be like Sesame Street, and sponsor a letter."

Sesame Street adopting all letters the episodes are brought by, would be great,
as would be all children being brought a character as an anniversary gift at least
once in their lives. I feel that this could be the way Unicode become ultimately
part of everybody?s real world, and get a place in people?s hearts. I?d like to tell
the uncles and aunts not to stick with ASCII only?hoping that the young will then
ask for a keyboard layout all their characters are on?and make it!


From mark at macchiato.com  Wed Jun  8 03:03:47 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Wed, 8 Jun 2016 10:03:47 +0200
Subject: Adopting ZWJ
In-Reply-To: <57573414.30301@khwilliamson.com>
References: <5757330A.3000109@khwilliamson.com>
 <57573414.30301@khwilliamson.com>
Message-ID: <CAJ2xs_HDQQYhi51A4P6tO7kxuc+0p6R5WbjE9S+P+aS434492Q@mail.gmail.com>

We wanted to be a bit conservative regarding those characters, partly
because we are using a payment service that is fussy. We could test it out
again ? but our first priority is getting U9.0 out the door!

Mark

On Tue, Jun 7, 2016 at 10:52 PM, Karl Williamson <public at khwilliamson.com>
wrote:

> On 06/07/2016 02:48 PM, Karl Williamson wrote:
>
>> I heard that someone was considering adopting ZWJ.  They seemed to think
>> that non-printables are not adoptable.  But I was unable to find a clear
>> list of criteria.  The page that allows one to adopt said that it wasn't
>> available, but that page really doesn't make it clear how one can test
>> for this without actually doing the adoption.  (Since it doesn't
>> actually ask for your credit card number on the initial page, one can
>> back out before the final commitment, but that's not a very friendly
>> interface)
>>
>>
> After I wrote that, I found this that I previously overlooked
>
> "You can?t sponsor candidate characters (those not yet released in a
> version of Unicode, such as the Emoji Candidates), nor certain characters
> such as invisible ones."
>
> But why this rule.  Why should someone be forbidden to adopt ZWJ?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160608/c7da74b6/attachment.html>

From davidj_faulks at yahoo.ca  Wed Jun  8 09:26:43 2016
From: davidj_faulks at yahoo.ca (David Faulks)
Date: Wed, 8 Jun 2016 14:26:43 +0000 (UTC)
Subject: No subject
References: <1529931306.384927.1465396003757.JavaMail.yahoo.ref@mail.yahoo.com>
Message-ID: <1529931306.384927.1465396003757.JavaMail.yahoo@mail.yahoo.com>

Hello,

Just a question here.

The Zodiac sign Capricorn has an alternate Glyph/Symbol (see below):
http://www.capricornzodiacsign.net/capricornsymbol.htm

It is only vaguely similar to the glyph found in the Unicode charts and astrological sites, and sometimes astrological software offers a choice between the two.

Since every font I have checked on my computer, uses a glyph close to the Unicode charts (if they have Zodiac symbols at all), I am thinking that it might be best to propose this as a separate character.

Is this a good idea? 

Also, Zodiac signs right now have Emoji representations. Would I have to submit this as an Emoji rather than a symbol? Would I have to make up a coloured Emoji Glyph?

Thanks for any responses.

David Faulks

From gwalla at gmail.com  Wed Jun  8 17:47:23 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Wed, 8 Jun 2016 15:47:23 -0700
Subject: 
In-Reply-To: <1529931306.384927.1465396003757.JavaMail.yahoo@mail.yahoo.com>
References: <1529931306.384927.1465396003757.JavaMail.yahoo.ref@mail.yahoo.com>
 <1529931306.384927.1465396003757.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <CA+p4_H3fOCgXWAM8kJSt1J+EZk=rQZR4goPtogfchUXfw9K8=Q@mail.gmail.com>

On Wed, Jun 8, 2016 at 7:26 AM, David Faulks <davidj_faulks at yahoo.ca> wrote:
> Hello,
>
> Just a question here.
>
> The Zodiac sign Capricorn has an alternate Glyph/Symbol (see below):
> http://www.capricornzodiacsign.net/capricornsymbol.htm
>
> It is only vaguely similar to the glyph found in the Unicode charts and astrological sites, and sometimes astrological software offers a choice between the two.
>
> Since every font I have checked on my computer, uses a glyph close to the Unicode charts (if they have Zodiac symbols at all), I am thinking that it might be best to propose this as a separate character.
>
> Is this a good idea?

Is it ever used alongside the more common symbol, with some semantic
distinction, or is it more of a stylistic choice?

> Also, Zodiac signs right now have Emoji representations. Would I have to submit this as an Emoji rather than a symbol? Would I have to make up a coloured Emoji Glyph?

I think the emoji representations of the standard zodiac symbols exist
because a Japanese cell phone provider put zodiac symbols in their
Shift-JIS emoji sets (since those symbols are not otherwise part of
the Shift-JIS standard).

From gwalla at gmail.com  Wed Jun  8 21:22:11 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Wed, 8 Jun 2016 19:22:11 -0700
Subject: 
In-Reply-To: <1529931306.384927.1465396003757.JavaMail.yahoo@mail.yahoo.com>
References: <1529931306.384927.1465396003757.JavaMail.yahoo.ref@mail.yahoo.com>
 <1529931306.384927.1465396003757.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <CA+p4_H3USyHfpMzomLsGnfirSSWKoNyO-uXbU5ojFnp1QuL09Q@mail.gmail.com>

On Wed, Jun 8, 2016 at 7:26 AM, David Faulks <davidj_faulks at yahoo.ca> wrote:
> Hello,
>
> Just a question here.
>
> The Zodiac sign Capricorn has an alternate Glyph/Symbol (see below):
> http://www.capricornzodiacsign.net/capricornsymbol.htm
>
> It is only vaguely similar to the glyph found in the Unicode charts and astrological sites, and sometimes astrological software offers a choice between the two.
>
> Since every font I have checked on my computer, uses a glyph close to the Unicode charts (if they have Zodiac symbols at all), I am thinking that it might be best to propose this as a separate character.
>
> Is this a good idea?

I just saw this alternate glyph pop up on a webpage, not as an image,
so I checked through the fonts on my system. It's apparently used for
U+2651 by the GNU FreeFont family, GNU Unifont, and Chrysanthi
Unicode. Chrysanthi does some odd things in the Miscellaneous Symbols
range but the others are pretty normal. It may just be a version of
the standard symbol with the loop enlarged and the left-hand side
reduced to a small wave or hook.

From davidj_faulks at yahoo.ca  Thu Jun  9 08:34:01 2016
From: davidj_faulks at yahoo.ca (David Faulks)
Date: Thu, 9 Jun 2016 13:34:01 +0000 (UTC)
Subject: Capricorn
References: <1681983953.125182.1465479241953.JavaMail.yahoo.ref@mail.yahoo.com>
Message-ID: <1681983953.125182.1465479241953.JavaMail.yahoo@mail.yahoo.com>

> On Wed, 6/8/16, Garth Wallace <gwalla at gmail.com> wrote:

>>  On Wed, Jun 8, 2016 at 7:26 AM, David Faulks
>> <davidj_faulks at yahoo.ca> wrote:
(cut text)
>> The Zodiac sign Capricorn has an alternate
>> Glyph/Symbol (see below):
(cut text)
>> Since every font I have checked on my computer,
>> uses a glyph close to the Unicode charts (if they
>> have Zodiac symbols at all), I am thinking that it might
>> be best to propose this as a separate character.
>>
>> Is this a good idea?
 
> I just saw this alternate glyph pop up on a webpage, not
> as an image, so I checked through the fonts on my
> system. It's apparently used for U+2651 by the GNU
> FreeFont family, GNU Unifont, and Chrysanthi Unicode.
> Chrysanthi does some odd things in the Miscellaneous
> Symbols range but the others are pretty normal. It may
> just be a version of the standard symbol with the loop
> enlarged and the left-hand side reduced to a small wave
> or hook.

Thank you for this information. Capricorn is pretty common,
and while I feel that completely different symbols for the
same thing should still be different characters, I was
uncertain in this case.

I might try to propose it as a standard variation, though.

David Faulks
 

From lang.support at gmail.com  Thu Jun  9 20:39:47 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Fri, 10 Jun 2016 11:39:47 +1000
Subject: Mende Kikakui Number 10
Message-ID: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>

Hi,

Currently I am doing some work on the Mende Kikakui script, and I was
wondering what the best way was to represent the number 10.

In the early proposals for the script there was a glyph and codepoint
specifically for the number 10. When the model for Mende Kikakui numbers
was changed before the finalising of the code block, the number ten was
removed. But using existing digits and numbers we can produce 1-9 and 11 ->
but we can not produce the number 10 from digits and numbers.

The number ten uses the same glyph as  syllable PU U+1E88E.

Should I use U+1E88E to represent both the number 10 and the syllable PU?

Andrew

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/f561e136/attachment.html>

From lang.support at gmail.com  Fri Jun 10 01:15:10 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Fri, 10 Jun 2016 16:15:10 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
Message-ID: <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>

Ok looking at issue again I guess the other alternative is to have a
discontiguous set of numbers. Represent 10 as U+1E8C7 U+1E8D1 and map it
within the font to the PU glyph.

And hope that font developers don't create a glyph based on shape of
 U+1E8C7 and U+1E8D1,  but PU instead.

Andrew

On Friday, 10 June 2016, Andrew Cunningham <lang.support at gmail.com> wrote:
> Hi,
> Currently I am doing some work on the Mende Kikakui script, and I was
wondering what the best way was to represent the number 10.
> In the early proposals for the script there was a glyph and codepoint
specifically for the number 10. When the model for Mende Kikakui numbers
was changed before the finalising of the code block, the number ten was
removed. But using existing digits and numbers we can produce 1-9 and 11 ->
but we can not produce the number 10 from digits and numbers.
> The number ten uses the same glyph as  syllable PU U+1E88E.
> Should I use U+1E88E to represent both the number 10 and the syllable PU?
> Andrew
>
> --
> Andrew Cunningham
> lang.support at gmail.com
>
>
>

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/d2a28b31/attachment.html>

From verdy_p at wanadoo.fr  Fri Jun 10 01:52:59 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 10 Jun 2016 08:52:59 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
Message-ID: <CAGa7JC3BpMzJsXxBE0sJxvNjtfhqc0HhCCZSGnjH0j-69LnrUg@mail.gmail.com>

Given that there's no digit for zero, you need to append combining
characters to digits 1-9 in order to multiply them by a base
10/100/1,000/10,000/100,000/1,000,000. The system is then additive. I don't
know how zero is represented. Note that for base 10, when the first digit
is 1 (i.e. for numbers 11-19), the combining character is not 1E8D1 (TENS)
but 1E8D0 (TEENS), i.e. the slash-like glyph. But the description says that
TEENS is only for numbers 11-19, not for number 10.

But I agree that there should be a reference in
http://www.unicode.org/charts/PDF/U1E800.pdf, to the description in
http://www.unicode.org/versions/Unicode8.0.0/ch19.pdf (section 19.8, pages
722-723) that would explain how to render 10 (add some rows in table 19-6
for the numbers 10/100/.../1,000,000).

This leaves a hole in the description. I'm not sure that the glyph for PU
is exactly the glyph for 10. Or what is the appropriate sequence:
ONE+TENS (1E8C7,1E8D1) or ONE+TEENS (1E8C7,1E8D0) ? The description is
ambiguous, and probably both sequences should produce the equivalent glyph.
However the letter PU (when meaning number 10) looks more like the glyph
produced by ONE+TEN (1E8C7,1E8D1).

Then how to represent zero ? Probably by a syllable or word meaning "none"
(don't know which it is), or by using European or Arabic digits (as
indicated in Chapter 19).


2016-06-10 8:15 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:

> Ok looking at issue again I guess the other alternative is to have a
> discontiguous set of numbers. Represent 10 as U+1E8C7 U+1E8D1 and map it
> within the font to the PU glyph.
>
> And hope that font developers don't create a glyph based on shape of
>  U+1E8C7 and U+1E8D1,  but PU instead.
>
> Andrew
>
>
> On Friday, 10 June 2016, Andrew Cunningham <lang.support at gmail.com> wrote:
> > Hi,
> > Currently I am doing some work on the Mende Kikakui script, and I was
> wondering what the best way was to represent the number 10.
> > In the early proposals for the script there was a glyph and codepoint
> specifically for the number 10. When the model for Mende Kikakui numbers
> was changed before the finalising of the code block, the number ten was
> removed. But using existing digits and numbers we can produce 1-9 and 11 ->
> but we can not produce the number 10 from digits and numbers.
> > The number ten uses the same glyph as  syllable PU U+1E88E.
> > Should I use U+1E88E to represent both the number 10 and the syllable PU?
> > Andrew
> >
> > --
> > Andrew Cunningham
> > lang.support at gmail.com
> >
> >
> >
>
> --
> Andrew Cunningham
> lang.support at gmail.com
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/f966eaca/attachment.html>

From lang.support at gmail.com  Fri Jun 10 02:00:30 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Fri, 10 Jun 2016 17:00:30 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGa7JC3BpMzJsXxBE0sJxvNjtfhqc0HhCCZSGnjH0j-69LnrUg@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <CAGa7JC3BpMzJsXxBE0sJxvNjtfhqc0HhCCZSGnjH0j-69LnrUg@mail.gmail.com>
Message-ID: <CAGJ7U-V98TeUgAkfO1+=n+wJV335AKMM7XaZ17ua6y8XUSoSSg@mail.gmail.com>

Hi Phillipe,

ONE+TEENS (1E8C7,1E8D0) is definitely the number 11

A.
On 10 Jun 2016 4:53 pm, "Philippe Verdy" <verdy_p at wanadoo.fr> wrote:

> Given that there's no digit for zero, you need to append combining
> characters to digits 1-9 in order to multiply them by a base
> 10/100/1,000/10,000/100,000/1,000,000. The system is then additive. I don't
> know how zero is represented. Note that for base 10, when the first digit
> is 1 (i.e. for numbers 11-19), the combining character is not 1E8D1 (TENS)
> but 1E8D0 (TEENS), i.e. the slash-like glyph. But the description says that
> TEENS is only for numbers 11-19, not for number 10.
>
> But I agree that there should be a reference in
> http://www.unicode.org/charts/PDF/U1E800.pdf, to the description in
> http://www.unicode.org/versions/Unicode8.0.0/ch19.pdf (section 19.8,
> pages 722-723) that would explain how to render 10 (add some rows in table
> 19-6 for the numbers 10/100/.../1,000,000).
>
> This leaves a hole in the description. I'm not sure that the glyph for PU
> is exactly the glyph for 10. Or what is the appropriate sequence:
> ONE+TENS (1E8C7,1E8D1) or ONE+TEENS (1E8C7,1E8D0) ? The description is
> ambiguous, and probably both sequences should produce the equivalent glyph.
> However the letter PU (when meaning number 10) looks more like the glyph
> produced by ONE+TEN (1E8C7,1E8D1).
>
> Then how to represent zero ? Probably by a syllable or word meaning "none"
> (don't know which it is), or by using European or Arabic digits (as
> indicated in Chapter 19).
>
>
>
> 2016-06-10 8:15 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:
>
>> Ok looking at issue again I guess the other alternative is to have a
>> discontiguous set of numbers. Represent 10 as U+1E8C7 U+1E8D1 and map it
>> within the font to the PU glyph.
>>
>> And hope that font developers don't create a glyph based on shape of
>>  U+1E8C7 and U+1E8D1,  but PU instead.
>>
>> Andrew
>>
>>
>> On Friday, 10 June 2016, Andrew Cunningham <lang.support at gmail.com>
>> wrote:
>> > Hi,
>> > Currently I am doing some work on the Mende Kikakui script, and I was
>> wondering what the best way was to represent the number 10.
>> > In the early proposals for the script there was a glyph and codepoint
>> specifically for the number 10. When the model for Mende Kikakui numbers
>> was changed before the finalising of the code block, the number ten was
>> removed. But using existing digits and numbers we can produce 1-9 and 11 ->
>> but we can not produce the number 10 from digits and numbers.
>> > The number ten uses the same glyph as  syllable PU U+1E88E.
>> > Should I use U+1E88E to represent both the number 10 and the syllable
>> PU?
>> > Andrew
>> >
>> > --
>> > Andrew Cunningham
>> > lang.support at gmail.com
>> >
>> >
>> >
>>
>> --
>> Andrew Cunningham
>> lang.support at gmail.com
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/ea56dad8/attachment.html>

From verdy_p at wanadoo.fr  Fri Jun 10 03:54:07 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 10 Jun 2016 10:54:07 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-V98TeUgAkfO1+=n+wJV335AKMM7XaZ17ua6y8XUSoSSg@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <CAGa7JC3BpMzJsXxBE0sJxvNjtfhqc0HhCCZSGnjH0j-69LnrUg@mail.gmail.com>
 <CAGJ7U-V98TeUgAkfO1+=n+wJV335AKMM7XaZ17ua6y8XUSoSSg@mail.gmail.com>
Message-ID: <CAGa7JC0W8aX2RRQHw1T-3xRrsMkik53skUsc4QL8nAGP-AWaPQ@mail.gmail.com>

I do not contest that about number 11, and it was not the question !

The question was about number **10**:
* ONE+TENS or ONE+TEENS ?
This is NOT specified clearly in TUS Chapter 19 which speaks about numbers
1-9 then 11-19 for TEENS, and TENS for numbers 20-99.

The question is the same about 110,210,...,910:
* (ONE..NINE)+HUNDREDS+ONE+TENS or (ONE..NINE)+HUNDREDS+ONE+TEENS ?

For me it seems that both questions will repy with ONE+TENS, not ONE+TEENS.


2016-06-10 9:00 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:

> Hi Phillipe,
>
> ONE+TEENS (1E8C7,1E8D0) is definitely the number 11
>
> A.
> On 10 Jun 2016 4:53 pm, "Philippe Verdy" <verdy_p at wanadoo.fr> wrote:
>
>> Given that there's no digit for zero, you need to append combining
>> characters to digits 1-9 in order to multiply them by a base
>> 10/100/1,000/10,000/100,000/1,000,000. The system is then additive. I don't
>> know how zero is represented. Note that for base 10, when the first digit
>> is 1 (i.e. for numbers 11-19), the combining character is not 1E8D1 (TENS)
>> but 1E8D0 (TEENS), i.e. the slash-like glyph. But the description says that
>> TEENS is only for numbers 11-19, not for number 10.
>>
>> But I agree that there should be a reference in
>> http://www.unicode.org/charts/PDF/U1E800.pdf, to the description in
>> http://www.unicode.org/versions/Unicode8.0.0/ch19.pdf (section 19.8,
>> pages 722-723) that would explain how to render 10 (add some rows in table
>> 19-6 for the numbers 10/100/.../1,000,000).
>>
>> This leaves a hole in the description. I'm not sure that the glyph for PU
>> is exactly the glyph for 10. Or what is the appropriate sequence:
>> ONE+TENS (1E8C7,1E8D1) or ONE+TEENS (1E8C7,1E8D0) ? The description is
>> ambiguous, and probably both sequences should produce the equivalent glyph.
>> However the letter PU (when meaning number 10) looks more like the glyph
>> produced by ONE+TEN (1E8C7,1E8D1).
>>
>> Then how to represent zero ? Probably by a syllable or word meaning
>> "none" (don't know which it is), or by using European or Arabic digits (as
>> indicated in Chapter 19).
>>
>>
>>
>> 2016-06-10 8:15 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:
>>
>>> Ok looking at issue again I guess the other alternative is to have a
>>> discontiguous set of numbers. Represent 10 as U+1E8C7 U+1E8D1 and map it
>>> within the font to the PU glyph.
>>>
>>> And hope that font developers don't create a glyph based on shape of
>>>  U+1E8C7 and U+1E8D1,  but PU instead.
>>>
>>> Andrew
>>>
>>>
>>> On Friday, 10 June 2016, Andrew Cunningham <lang.support at gmail.com>
>>> wrote:
>>> > Hi,
>>> > Currently I am doing some work on the Mende Kikakui script, and I was
>>> wondering what the best way was to represent the number 10.
>>> > In the early proposals for the script there was a glyph and codepoint
>>> specifically for the number 10. When the model for Mende Kikakui numbers
>>> was changed before the finalising of the code block, the number ten was
>>> removed. But using existing digits and numbers we can produce 1-9 and 11 ->
>>> but we can not produce the number 10 from digits and numbers.
>>> > The number ten uses the same glyph as  syllable PU U+1E88E.
>>> > Should I use U+1E88E to represent both the number 10 and the syllable
>>> PU?
>>> > Andrew
>>> >
>>> > --
>>> > Andrew Cunningham
>>> > lang.support at gmail.com
>>> >
>>> >
>>> >
>>>
>>> --
>>> Andrew Cunningham
>>> lang.support at gmail.com
>>>
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/35f512e4/attachment.html>

From lang.support at gmail.com  Fri Jun 10 04:32:42 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Fri, 10 Jun 2016 19:32:42 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-XPptSuRa=kGBJdRW86Uhs_O_mHH-H5_cfvSVf42-UEGA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <CAGa7JC3BpMzJsXxBE0sJxvNjtfhqc0HhCCZSGnjH0j-69LnrUg@mail.gmail.com>
 <CAGJ7U-V98TeUgAkfO1+=n+wJV335AKMM7XaZ17ua6y8XUSoSSg@mail.gmail.com>
 <CAGa7JC0W8aX2RRQHw1T-3xRrsMkik53skUsc4QL8nAGP-AWaPQ@mail.gmail.com>
 <CAGJ7U-XPptSuRa=kGBJdRW86Uhs_O_mHH-H5_cfvSVf42-UEGA@mail.gmail.com>
Message-ID: <CAGJ7U-WrkjGYxMP1LFRCuT2OVLVEAdd5ye4PFW-byddzdH4RGQ@mail.gmail.com>

I'd agree that it is likely ONE+TENS.

Looking at the original proposal and articles on the number system .... it
was originally 1-9, 10, 11-19, 20-99 etc

But became 1-9, 11-19, 20-99, etc during the deliberations on the model the
numbers would follow.

A.

At least thats how I reconstrct it from the public documrnts I have seen.

On Friday, 10 June 2016, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> I do not contest that about number 11, and it was not the question !
> The question was about number **10**:
> * ONE+TENS or ONE+TEENS ?
> This is NOT specified clearly in TUS Chapter 19 which speaks about
numbers 1-9 then 11-19 for TEENS, and TENS for numbers 20-99.
> The question is the same about 110,210,...,910:
> * (ONE..NINE)+HUNDREDS+ONE+TENS or (ONE..NINE)+HUNDREDS+ONE+TEENS ?
> For me it seems that both questions will repy with ONE+TENS, not
ONE+TEENS.
>
> 2016-06-10 9:00 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:
>>
>> Hi Phillipe,
>>
>> ONE+TEENS (1E8C7,1E8D0) is definitely the number 11
>>
>> A.
>>
>> On 10 Jun 2016 4:53 pm, "Philippe Verdy" <verdy_p at wanadoo.fr> wrote:
>>>
>>> Given that there's no digit for zero, you need to append combining
characters to digits 1-9 in order to multiply them by a base
10/100/1,000/10,000/100,000/1,000,000. The system is then additive. I don't
know how zero is represented. Note that for base 10, when the first digit
is 1 (i.e. for numbers 11-19), the combining character is not 1E8D1 (TENS)
but 1E8D0 (TEENS), i.e. the slash-like glyph. But the description says that
TEENS is only for numbers 11-19, not for number 10.
>>> But I agree that there should be a reference in
http://www.unicode.org/charts/PDF/U1E800.pdf, to the description in
http://www.unicode.org/versions/Unicode8.0.0/ch19.pdf (section 19.8, pages
722-723) that would explain how to render 10 (add some rows in table 19-6
for the numbers 10/100/.../1,000,000).
>>> This leaves a hole in the description. I'm not sure that the glyph for
PU is exactly the glyph for 10. Or what is the appropriate sequence:
ONE+TENS (1E8C7,1E8D1) or ONE+TEENS (1E8C7,1E8D0) ? The description is
ambiguous, and probably both sequences should produce the equivalent glyph.
However the letter PU (when meaning number 10) looks more like the glyph
produced by ONE+TEN (1E8C7,1E8D1).
>>> Then how to represent zero ? Probably by a syllable or word meaning
"none" (don't know which it is), or by using European or Arabic digits (as
indicated in Chapter 19).
>>>
>>>
>>> 2016-06-10 8:15 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:
>>>>
>>>> Ok looking at issue again I guess the other alternative is to have a
discontiguous set of numbers. Represent 10 as U+1E8C7 U+1E8D1 and map it
within the font to the PU glyph.
>>>>
>>>> And hope that font developers don't create a glyph based on shape of
 U+1E8C7 and U+1E8D1,  but PU instead.
>>>>
>>>> Andrew
>>>>
>>>> On Friday, 10 June 2016, Andrew Cunningham <lang.support at gmail.com>
wrote:
>>>> > Hi,
>>>> > Currently I am doing some work on the Mende Kikakui script, and I
was wondering what the best way was to represent the number 10.
>>>> > In the early proposals for the script there was a glyph and
codepoint specifically for the number 10. When the model for Mende Kikakui
numbers was changed before the finalising of the code block, the number ten
was removed. But using existing digits and numbers we can produce 1-9 and
11 -> but we can not produce the number 10 from digits and numbers.
>>>> > The number ten uses the same glyph as  syllable PU U+1E88E.
>>>> > Should I use U+1E88E to represent both the number 10 and the
syllable PU?
>>>> > Andrew
>>>> >
>>>> > --
>>>> > Andrew Cunningham
>>>> > lang.support at gmail.com
>>>> >
>>>> >
>>>> >
>>>>
>>>> --
>>>> Andrew Cunningham
>>>> lang.support at gmail.com
>>>>
>>>>
>>>>
>>>
>
>

--
Andrew Cunningham
lang.support at gmail.com


-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/b46e4728/attachment.html>

From lang.support at gmail.com  Fri Jun 10 04:55:58 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Fri, 10 Jun 2016 19:55:58 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGa7JC0W8aX2RRQHw1T-3xRrsMkik53skUsc4QL8nAGP-AWaPQ@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <CAGa7JC3BpMzJsXxBE0sJxvNjtfhqc0HhCCZSGnjH0j-69LnrUg@mail.gmail.com>
 <CAGJ7U-V98TeUgAkfO1+=n+wJV335AKMM7XaZ17ua6y8XUSoSSg@mail.gmail.com>
 <CAGa7JC0W8aX2RRQHw1T-3xRrsMkik53skUsc4QL8nAGP-AWaPQ@mail.gmail.com>
Message-ID: <CAGJ7U-WWGmSPf6+b_i8bSPcjFisR5qgRm2HHG29tNYLHCNzH=g@mail.gmail.com>

The original proposals inluded a specific numbr 10 codepoint. I assume it
was removed and its representation was to be generated by use of the
combining characters

In the original proposal there was nothing corresponding to ONE+TENS
instead there was a distinct number TEN. The glyph for number 10 was
identical to glyph for syllable PU.

A.


On Friday, 10 June 2016, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> I do not contest that about number 11, and it was not the question !
> The question was about number **10**:
> * ONE+TENS or ONE+TEENS ?
> This is NOT specified clearly in TUS Chapter 19 which speaks about
numbers 1-9 then 11-19 for TEENS, and TENS for numbers 20-99.
> The question is the same about 110,210,...,910:
> * (ONE..NINE)+HUNDREDS+ONE+TENS or (ONE..NINE)+HUNDREDS+ONE+TEENS ?
> For me it seems that both questions will repy with ONE+TENS, not
ONE+TEENS.
>
> 2016-06-10 9:00 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:
>>
>> Hi Phillipe,
>>
>> ONE+TEENS (1E8C7,1E8D0) is definitely the number 11
>>
>> A.
>>
>> On 10 Jun 2016 4:53 pm, "Philippe Verdy" <verdy_p at wanadoo.fr> wrote:
>>>
>>> Given that there's no digit for zero, you need to append combining
characters to digits 1-9 in order to multiply them by a base
10/100/1,000/10,000/100,000/1,000,000. The system is then additive. I don't
know how zero is represented. Note that for base 10, when the first digit
is 1 (i.e. for numbers 11-19), the combining character is not 1E8D1 (TENS)
but 1E8D0 (TEENS), i.e. the slash-like glyph. But the description says that
TEENS is only for numbers 11-19, not for number 10.
>>> But I agree that there should be a reference in
http://www.unicode.org/charts/PDF/U1E800.pdf, to the description in
http://www.unicode.org/versions/Unicode8.0.0/ch19.pdf (section 19.8, pages
722-723) that would explain how to render 10 (add some rows in table 19-6
for the numbers 10/100/.../1,000,000).
>>> This leaves a hole in the description. I'm not sure that the glyph for
PU is exactly the glyph for 10. Or what is the appropriate sequence:
ONE+TENS (1E8C7,1E8D1) or ONE+TEENS (1E8C7,1E8D0) ? The description is
ambiguous, and probably both sequences should produce the equivalent glyph.
However the letter PU (when meaning number 10) looks more like the glyph
produced by ONE+TEN (1E8C7,1E8D1).
>>> Then how to represent zero ? Probably by a syllable or word meaning
"none" (don't know which it is), or by using European or Arabic digits (as
indicated in Chapter 19).
>>>
>>>
>>> 2016-06-10 8:15 GMT+02:00 Andrew Cunningham <lang.support at gmail.com>:
>>>>
>>>> Ok looking at issue again I guess the other alternative is to have a
discontiguous set of numbers. Represent 10 as U+1E8C7 U+1E8D1 and map it
within the font to the PU glyph.
>>>>
>>>> And hope that font developers don't create a glyph based on shape of
 U+1E8C7 and U+1E8D1,  but PU instead.
>>>>
>>>> Andrew
>>>>
>>>> On Friday, 10 June 2016, Andrew Cunningham <lang.support at gmail.com>
wrote:
>>>> > Hi,
>>>> > Currently I am doing some work on the Mende Kikakui script, and I
was wondering what the best way was to represent the number 10.
>>>> > In the early proposals for the script there was a glyph and
codepoint specifically for the number 10. When the model for Mende Kikakui
numbers was changed before the finalising of the code block, the number ten
was removed. But using existing digits and numbers we can produce 1-9 and
11 -> but we can not produce the number 10 from digits and numbers.
>>>> > The number ten uses the same glyph as  syllable PU U+1E88E.
>>>> > Should I use U+1E88E to represent both the number 10 and the
syllable PU?
>>>> > Andrew
>>>> >
>>>> > --
>>>> > Andrew Cunningham
>>>> > lang.support at gmail.com
>>>> >
>>>> >
>>>> >
>>>>
>>>> --
>>>> Andrew Cunningham
>>>> lang.support at gmail.com
>>>>
>>>>
>>>>
>>>
>
>

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/684a9b51/attachment.html>

From frederic.grosshans at gmail.com  Fri Jun 10 10:16:51 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Fri, 10 Jun 2016 17:16:51 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
Message-ID: <575AD9E3.8060301@gmail.com>

If you look at the documents archived for 2012 
(http://www.unicode.org/L2/L2013/13001-register-2012.htm), you will 
find, beyond the Mende proposal 
(http://www.unicode.org/L2/L2012/12023-n4167-mende.pdf), several 
documents by Deborah Anderson focused on the problem of the encoding 
model Mende Numbers. 
(http://www.unicode.org/L2/L2012/12049-mende-model.pdf , 
http://www.unicode.org/L2/L2012/12265-mende-numbers.pdf ). They all 
discuss the problem posed by the representation of 10 in a model using 
combining character, and the ambiguity of its representation.

The there is a document 
(http://www.unicode.org/L2/L2012/12335-n4375-mende-adhoc.pdf) on the ad 
hoc meeting deciding the (different) encoding model which has been kept 
for Unicode. But neither this document, nor the unicode standard 
expliceitely say how to represent 10 or say that 10 has an inherent dot. 
The document explicitly says that ?precomposed glyphs in smart fonts 
will give the best representation?, so my reading is almost the same as 
yours :

Le 10/06/2016 08:15, Andrew Cunningham a ?crit :
>  Represent 10 as U+1E8C7 U+1E8D1 and map it within the font to the PU 
> glyph.
except that the vertical line of PU goes beyond its ?bowl? which is not 
the case for the glyph for 10, which should look like the glyph for 
TENS, with a dot above.

>
> And hope that font developers don't create a glyph based on shape of 
>  U+1E8C7 and U+1E8D1,  but PU instead.

Once someone present in the ad-hoc Mende meeting (some read this list) 
confirms (or corrects) this interpretation, I guess it will be time to 
add some clarification in the standard.

        Fr?d?ric


From verdy_p at wanadoo.fr  Fri Jun 10 11:05:33 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 10 Jun 2016 18:05:33 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <575AD9E3.8060301@gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
Message-ID: <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>

I can reread the doc several times (I did not read it precisely before) and
in fact Chapter 19 is absolutely not clear at all.

OK, <ONE;combining TEENS> represents 11, but <ONE;combining TENS> is not
clearly represents 10, and the proposals do not exhibit 10 with the same
glyph as PU (even if it is based on it, in fact the combining TENS is a
small subscript glyph variant of letter/syllable PU intended to mark
digits).

Using letter PU would discard the initial digit 1, and the subscript
variant, making it confusable with a real letter/syllable PU.

The initial proposal for letter 10 was a PU with a dot, i.e. instead of
small-subscripting the PU glyph for TENS, the PU glyph is still used, but
it is the initial digit one (normally a vertical stroke) which is
subperscripted as a smaller tick (an in my opinion this tick should join
with the letter PU, just like the other digits+TENS are displayed by
attaching the TENS subscript to the standard digit.

I've made some other searches and digits+tens are also rendered by
combining two glyphs of equal vertical size stacked on top of each other
(so the base digit becomes a superscript variant the TENS is also a
subscript, except that in this mode, everything reamins above the baseline
(no need of descenders), numbers are rendered completely with sequences of
combined digits all having the same vertical height, like other
letters/syllables.

So I don't think that using letter PU can correctly represent the number
10. <ONE+combining TENS> is the way to go (it is then followed by
<ONE+combining TEENS> for 11... <NINE+Combining TEENS> for 19, then
<TWO+combining TENS> for 20, <TWO+combining TENS+ONE> for 21...).

Now for fonts, the sequences with <digit+combining TENS> and
<digit+combining TEENS> both require changing the shape and reducing
vertical size of the initial base digit. There's no complex change of shape
for the combining mark itself: it stacks vertically normally below the
reduced initial digit. There's no case where both combining marks would be
used together for some special meaning, and no evidence that these marks
can be repeated: there can be only one combining TENS or one combining
TEENS.

Other diacritics however may be used if needed for additional notations
outside the number itself (such as arrows or enclosing marks), and would be
encoded after the <digit+TENS> or <digit+TEENS>.

But encoding a standalone digit 10 would have been better (and probably
extending it to standalone versions for 11 and 12, for usage with months
numbers and hours on clock, just like with Roman digits). It would be
interesting to look at how traditional solar clocks or traditional
calendars, or even "modern" mechanical clocks with displays in Kikakui
Mende, are showing these common numbers 10,11,12 (may be there are photos
or facsimiles of artworks or "real life" photos kept in some museum or in
book library or videos showing some religious celebrations or social events
where these digits would have been displayed or taught).


2016-06-10 17:16 GMT+02:00 Fr?d?ric Grosshans <frederic.grosshans at gmail.com>
:

> If you look at the documents archived for 2012 (
> http://www.unicode.org/L2/L2013/13001-register-2012.htm), you will find,
> beyond the Mende proposal (
> http://www.unicode.org/L2/L2012/12023-n4167-mende.pdf), several documents
> by Deborah Anderson focused on the problem of the encoding model Mende
> Numbers. (http://www.unicode.org/L2/L2012/12049-mende-model.pdf ,
> http://www.unicode.org/L2/L2012/12265-mende-numbers.pdf ). They all
> discuss the problem posed by the representation of 10 in a model using
> combining character, and the ambiguity of its representation.
>
> The there is a document (
> http://www.unicode.org/L2/L2012/12335-n4375-mende-adhoc.pdf) on the ad
> hoc meeting deciding the (different) encoding model which has been kept for
> Unicode. But neither this document, nor the unicode standard expliceitely
> say how to represent 10 or say that 10 has an inherent dot. The document
> explicitly says that ?precomposed glyphs in smart fonts will give the best
> representation?, so my reading is almost the same as yours :
>
> Le 10/06/2016 08:15, Andrew Cunningham a ?crit :
>
>>  Represent 10 as U+1E8C7 U+1E8D1 and map it within the font to the PU
>> glyph.
>>
> except that the vertical line of PU goes beyond its ?bowl? which is not
> the case for the glyph for 10, which should look like the glyph for TENS,
> with a dot above.
>
>
>> And hope that font developers don't create a glyph based on shape of
>> U+1E8C7 and U+1E8D1,  but PU instead.
>>
>
> Once someone present in the ad-hoc Mende meeting (some read this list)
> confirms (or corrects) this interpretation, I guess it will be time to add
> some clarification in the standard.
>
>        Fr?d?ric
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/4fa14a60/attachment.html>

From frederic.grosshans at gmail.com  Fri Jun 10 11:43:21 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Fri, 10 Jun 2016 18:43:21 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
Message-ID: <575AEE29.90308@gmail.com>

Le 10/06/2016 18:05, Philippe Verdy a ?crit :
>
> OK, <ONE;combining TEENS> represents 11, but <ONE;combining TENS> is 
> not clearly represents 10, and the proposals do not exhibit 10 with 
> the same glyph as PU (even if it is based on it, in fact the combining 
> TENS is a small subscript glyph variant of letter/syllable PU intended 
> to mark digits).
>
> Using letter PU would discard the initial digit 1, and the subscript 
> variant, making it confusable with a real letter/syllable PU.
>
> The initial proposal for letter 10 was a PU with a dot, i.e. instead 
> of small-subscripting the PU glyph for TENS, the PU glyph is still 
> used, but it is the initial digit one (normally a vertical stroke) 
> which is subperscripted as a smaller tick (an in my opinion this tick 
> should join with the letter PU, just like the other digits+TENS are 
> displayed by attaching the TENS subscript to the standard digit.
>
Reading the proposal again, there is a mention that the glyph for 10 
(puu) may be related to the one for PU (see page 3). They look really 
similar, have both the same dot above, but the difference is the extent 
of the vertical line on the right side. The normal way to write 10 does 
NOT include a digit 1. (see discussion at the end of p4, where it is 
explicitly stated), hence the confusion about the proper encoding of 
number 10


[...]
>
> But encoding a standalone digit 10 would have been better
It has certainly been considered, and one can guess from the ad-hoc 
document that many solutions have been evaluated and defended during 
this meeting, and the final decision was a practical compromise. The 
problem with the standalone number 10 is that the native user of the 
script see it as the same symbol as the TENS number, with an inherent 
dot which disapears when combined with something else.

> (and probably extending it to standalone versions for 11 and 12, for 
> usage with months numbers and hours on clock, just like with Roman 
> digits).
No! Roman numerals where included for compatibility with East Asian 
standards. They are compatibility characters.

> It would be interesting to look at how traditional solar clocks or 
> traditional calendars, or even "modern" mechanical clocks with 
> displays in Kikakui Mende, are showing these common numbers 10,11,12 
> (may be there are photos or facsimiles of artworks or "real life" 
> photos kept in some museum or in book library or videos showing some 
> religious celebrations or social events where these digits would have 
> been displayed or taught).
>

It may be interesting, but no standardisation happen because we 
speculate that ?may be there are photos? showing these characters, which 
are presumably encodable as sequence !

   Fr?d?ric

From verdy_p at wanadoo.fr  Fri Jun 10 13:51:02 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 10 Jun 2016 20:51:02 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <575AEE29.90308@gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <575AEE29.90308@gmail.com>
Message-ID: <CAGa7JC1Ko1qo6vOcEmBNwM_D0Dv_REBKH07Bxr_b0ETrLh128w@mail.gmail.com>

2016-06-10 18:43 GMT+02:00 Fr?d?ric Grosshans <frederic.grosshans at gmail.com>
:

> Le 10/06/2016 18:05, Philippe Verdy a ?crit :
>
>>
>> OK, <ONE;combining TEENS> represents 11, but <ONE;combining TENS> is not
>> clearly represents 10, and the proposals do not exhibit 10 with the same
>> glyph as PU (even if it is based on it, in fact the combining TENS is a
>> small subscript glyph variant of letter/syllable PU intended to mark
>> digits).
>>
>> Using letter PU would discard the initial digit 1, and the subscript
>> variant, making it confusable with a real letter/syllable PU.
>>
>> The initial proposal for letter 10 was a PU with a dot, i.e. instead of
>> small-subscripting the PU glyph for TENS, the PU glyph is still used, but
>> it is the initial digit one (normally a vertical stroke) which is
>> subperscripted as a smaller tick (an in my opinion this tick should join
>> with the letter PU, just like the other digits+TENS are displayed by
>> attaching the TENS subscript to the standard digit.
>>
>> Reading the proposal again, there is a mention that the glyph for 10
> (puu) may be related to the one for PU (see page 3). They look really
> similar, have both the same dot above, but the difference is the extent of
> the vertical line on the right side. The normal way to write 10 does NOT
> include a digit 1. (see discussion at the end of p4, where it is explicitly
> stated), hence the confusion about the proper encoding of number 10
>
>
> [...]
>
>>
>> But encoding a standalone digit 10 would have been better
>>
> It has certainly been considered, and one can guess from the ad-hoc
> document that many solutions have been evaluated and defended during this
> meeting, and the final decision was a practical compromise. The problem
> with the standalone number 10 is that the native user of the script see it
> as the same symbol as the TENS number, with an inherent dot which disapears
> when combined with something else.


So the error was to encode the TENS as a combining character instead of a
standalone, that would have just created a ligature when it follows a digit
from 2 to 9

If TENS had been a normal digit (non combining) the sequence from 1 to 10
would be uninterrupted and encoded with just one character.

Between 11 and 19, the TEENS would still be needed as a combining character
after a digit 1-9 (it does not exist in standalone, except after a SPACE or
NBSP or a DOTTED CIRCLE to show it as a spacing character like we do for
usual diacritics).

Then for 20, 30, ... 90, we would have just encoded TWO..NINE, TEN (as a
contextual ligature). No ligature was needed for 10 (only encoded as the
single character TEN).

But now that TEN is a combining character we then need to use NBSP,TEN...
or ONE,TEN ! I'm not convinced, given waht you say, that this insertion of
ONE is conceptually correct if  the language is perceived as not
differentiating the character when it is used in isolation for 10 or in
combination with TWO for 20.

Having to insert a ONE before TEN looks like a Unicode-specific quirk not
matching the logical perception of the script.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160610/382d8e30/attachment.html>

From lang.support at gmail.com  Fri Jun 10 16:51:27 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 11 Jun 2016 07:51:27 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
Message-ID: <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>

Hi Phillipe,

On Saturday, 11 June 2016, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> OK, <ONE;combining TEENS> represents 11, but <ONE;combining TENS> is not
clearly represents 10, and the proposals do not exhibit 10 with the same
glyph as PU (even if it is based on it, in fact the combining TENS is a
small subscript glyph variant of letter/syllable PU intended to mark
digits).
>

Mende Kikakui script disolays a high degree of glyph variation. Some
variations minor, some variations more substantive.

The syllable PU can be found as it is in the charts, it can be found
looking like the number 10. Other variations are also observed.

The ideal situation would have been to encode the number 10. But in its
absence, I guess ONE+TENS may be the approach. Even though it seems less
than ideal.

A.

A.

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/87f6aaf7/attachment.html>

From doug at ewellic.org  Fri Jun 10 16:59:45 2016
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 10 Jun 2016 14:59:45 -0700
Subject: Mende Kikakui Number 10
Message-ID: <20160610145945.665a7a7059d7ee80bb4d670165c8327d.382bc4359d.wbe@email03.godaddy.com>

How does one represent the values 100 and 1000 in Mende Kikakui? Is it
not with ONE?+?HUNDREDS and ONE?+?THOUSANDS respectively?

If so, then how is encoding 10 as ONE?+?TENS any different? Am I
missing something?

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From kenwhistler at att.net  Fri Jun 10 17:20:24 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 10 Jun 2016 15:20:24 -0700
Subject: Mende Kikakui Number 10
In-Reply-To: <20160610145945.665a7a7059d7ee80bb4d670165c8327d.382bc4359d.wbe@email03.godaddy.com>
References: <20160610145945.665a7a7059d7ee80bb4d670165c8327d.382bc4359d.wbe@email03.godaddy.com>
Message-ID: <a5c1b743-613e-aa2c-8fcb-434bfb18d266@att.net>


On 6/10/2016 2:59 PM, Doug Ewell wrote:
> How does one represent the values 100 and 1000 in Mende Kikakui? Is it
> not with ONE + HUNDREDS and ONE + THOUSANDS respectively?
>
> If so, then how is encoding 10 as ONE + TENS any different? Am I
> missing something?
>
Nope, you got it right: 10 = <1E8C7, 1E8D1>. Put ligature glyph for 10 
(with the dot) in your Mende
Kikakui font. Problem solved.

--Ken

From everson at evertype.com  Fri Jun 10 17:23:20 2016
From: everson at evertype.com (Michael Everson)
Date: Fri, 10 Jun 2016 23:23:20 +0100
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
Message-ID: <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>

We encoded MYANMAR LETTER WA and MYANMAR DIGIT ZERO separately because the latter is used in decimal arithmetic, which is essential and well supported by computers.

Mende Kikakui has no ZERO. This is a fault, and they would do well to devise one. An oval with a line through it like ? would do. But they don?t have this. We have in the proposal an image of a tax document of some sort. This has not been transliterated and translated. It may or may not contain the number ?10?. 

MENDE KIKAKUI SYLLABLE PU is the appropriate character to use for a non-decimal 10. The dot or not-dot or the length of the bar is not relevant; I understand that both occur for both entities. Do we have other LETTER characters which are disunified from NUMBER (as opposed to DIGIT) characters? If so, then consistency might be a reason to disunify them. 


From everson at evertype.com  Fri Jun 10 17:25:46 2016
From: everson at evertype.com (Michael Everson)
Date: Fri, 10 Jun 2016 23:25:46 +0100
Subject: Mende Kikakui Number 10
In-Reply-To: <a5c1b743-613e-aa2c-8fcb-434bfb18d266@att.net>
References: <20160610145945.665a7a7059d7ee80bb4d670165c8327d.382bc4359d.wbe@email03.godaddy.com>
 <a5c1b743-613e-aa2c-8fcb-434bfb18d266@att.net>
Message-ID: <23FCE93C-0B10-4469-9F80-A8C15121F1A9@evertype.com>

On 10 Jun 2016, at 23:20, Ken Whistler <kenwhistler at att.net> wrote:
> 
> On 6/10/2016 2:59 PM, Doug Ewell wrote:
>> How does one represent the values 100 and 1000 in Mende Kikakui? Is it
>> not with ONE + HUNDREDS and ONE + THOUSANDS respectively?
>> 
>> If so, then how is encoding 10 as ONE + TENS any different? Am I
>> missing something?
> 
> Nope, you got it right: 10 = <1E8C7, 1E8D1>. Put ligature glyph for 10 (with the dot) in your Mende
> Kikakui font. Problem solved.

If that?s better than just using SYLLABLE PU, OK, but please document that in the standard. It is a little non-intuitive. Well, at least a little non-obvious.

:-)

Michael

From kenwhistler at att.net  Fri Jun 10 17:34:15 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 10 Jun 2016 15:34:15 -0700
Subject: Mende Kikakui Number 10
In-Reply-To: <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
Message-ID: <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>


On 6/10/2016 3:23 PM, Michael Everson wrote:
> Mende Kikakui has no ZERO. This is a fault, and they would do well to devise one. An oval with a line through it like ? would do. But they don?t have this.

I concur with that. If the users of this system decide that they want to 
have a decimal radix system instead of the system documented with the 
combining marks for decimal ranks, then adding a zero at 1E8C6 would be 
feasible. That's why we left a gap at that point in the chart.

>
> MENDE KIKAKUI SYLLABLE PU is the appropriate character to use for a non-decimal 10. The dot or not-dot or the length of the bar is not relevant; I understand that both occur for both entities. Do we have other LETTER characters which are disunified from NUMBER (as opposed to DIGIT) characters? If so, then consistency might be a reason to disunify them.
>

I disagree about that. There is no reason to depart from the logic of 
the system for this one value. Add one ligature glyph to your font for 
the sequence for 10, and you're done.

--Ken


From lang.support at gmail.com  Fri Jun 10 19:34:16 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 11 Jun 2016 10:34:16 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
Message-ID: <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>

On Saturday, 11 June 2016, Ken Whistler <kenwhistler at att.net> wrote:
>
> I disagree about that. There is no reason to depart from the logic of the
system for this one value. Add one ligature glyph to your font for the
sequence for 10, and you're done.
>
>

There is the logic of how kikakui numbers are encoded in Unicode and there
is the internal logic of the numeral system itself. They are not
necessarily the same.

There are two few descriptions of the system for me to be definitive ....
but the number ten seems hold a unique position within the numeral system.

A.

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/061b9880/attachment.html>

From everson at evertype.com  Fri Jun 10 19:47:11 2016
From: everson at evertype.com (Michael Everson)
Date: Sat, 11 Jun 2016 01:47:11 +0100
Subject: Mende Kikakui Number 10
In-Reply-To: <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
Message-ID: <C4933BEB-82D2-4515-BF8E-07AB4B6B7765@evertype.com>

On 10 Jun 2016, at 23:34, Ken Whistler <kenwhistler at att.net> wrote:
> On 6/10/2016 3:23 PM, Michael Everson wrote:
>> Mende Kikakui has no ZERO. This is a fault, and they would do well to devise one. An oval with a line through it like ? would do. But they don?t have this.
> 
> I concur with that. If the users of this system decide that they want to have a decimal radix system instead of the system documented with the combining marks for decimal ranks, then adding a zero at 1E8C6 would be feasible. That's why we left a gap at that point in the chart.

Indeed!

>> MENDE KIKAKUI SYLLABLE PU is the appropriate character to use for a non-decimal 10. The dot or not-dot or the length of the bar is not relevant; I understand that both occur for both entities. Do we have other LETTER characters which are disunified from NUMBER (as opposed to DIGIT) characters? If so, then consistency might be a reason to disunify them.
> 
> I disagree about that. There is no reason to depart from the logic of the system for this one value. Add one ligature glyph to your font for the sequence for 10, and you're done.

You?re right about that. I hadn?t considered the ligature being structurally appropriate for this usage. (It would have been more obvious if Andrew had given the character names alongside the code positions; I hadn?t looked it up yet.)

Michael

From kenwhistler at att.net  Fri Jun 10 19:50:19 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Fri, 10 Jun 2016 17:50:19 -0700
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
Message-ID: <02b4e913-0253-b4d0-8eb8-3c0a520abf93@att.net>


On 6/10/2016 5:34 PM, Andrew Cunningham wrote:
> There are two few descriptions of the system for me to be definitive 
> .... but the number ten seems hold a unique position within the 
> numeral system.

As does the number 10 in every decimal numeral system. ;-)

But that doesn't automatically require that it be *encoded* with a 
single character. After all the number 10 in the European decimal 
numeral system is also represented with a character sequence: <0031, 0030>.

--Ken


From lang.support at gmail.com  Fri Jun 10 20:47:39 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 11 Jun 2016 11:47:39 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <02b4e913-0253-b4d0-8eb8-3c0a520abf93@att.net>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <02b4e913-0253-b4d0-8eb8-3c0a520abf93@att.net>
Message-ID: <CAGJ7U-VwFGG7J4FpwuxMzZp53oehwL6_XHftMAyVnaUFekKYCQ@mail.gmail.com>

I am not suggesting it needs to be encoded. And I did suggest that using
the digit one and the symbol for tens was an option.

It can be done via a ligature. It would have to be a required ligature.
Since other ligature types may or may not be enabled in various contexts.
And we dont want default substitution and mark positioning to generate a
non-ligature equivalent.

A.

An it will be interesting to see which rendering engines handle kikakui.

A.


On Saturday, 11 June 2016, Ken Whistler <kenwhistler at att.net> wrote:
>
> On 6/10/2016 5:34 PM, Andrew Cunningham wrote:
>>
>> There are two few descriptions of the system for me to be definitive
.... but the number ten seems hold a unique position within the numeral
system.
>
> As does the number 10 in every decimal numeral system. ;-)
>
> But that doesn't automatically require that it be *encoded* with a single
character. After all the number 10 in the European decimal numeral system
is also represented with a character sequence: <0031, 0030>.
>
> --Ken
>
>

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/182ebfa8/attachment.html>

From everson at evertype.com  Fri Jun 10 21:29:51 2016
From: everson at evertype.com (Michael Everson)
Date: Sat, 11 Jun 2016 03:29:51 +0100
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-VwFGG7J4FpwuxMzZp53oehwL6_XHftMAyVnaUFekKYCQ@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <02b4e913-0253-b4d0-8eb8-3c0a520abf93@att.net>
 <CAGJ7U-VwFGG7J4FpwuxMzZp53oehwL6_XHftMAyVnaUFekKYCQ@mail.gmail.com>
Message-ID: <61B082D3-ECFE-4162-B90F-FB65EDAC5E5B@evertype.com>

On 11 Jun 2016, at 02:47, Andrew Cunningham <lang.support at gmail.com> wrote:

> It can be done via a ligature. It would have to be a required ligature. Since other ligature types may or may not be enabled in various contexts. And we dont want default substitution and mark positioning to generate a non-ligature equivalent. 

Aren?t all of the number combinations required ligatures?

Michael

From lang.support at gmail.com  Fri Jun 10 22:25:49 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sat, 11 Jun 2016 13:25:49 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <61B082D3-ECFE-4162-B90F-FB65EDAC5E5B@evertype.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <02b4e913-0253-b4d0-8eb8-3c0a520abf93@att.net>
 <CAGJ7U-VwFGG7J4FpwuxMzZp53oehwL6_XHftMAyVnaUFekKYCQ@mail.gmail.com>
 <61B082D3-ECFE-4162-B90F-FB65EDAC5E5B@evertype.com>
Message-ID: <CAGJ7U-UAVqgPcN5DaCLR5zCSytSdfE6nd-Es-M3Bc0JH7c2VOw@mail.gmail.com>

rlig is the quickest and easiest approach. But in theory could be done
other more complicated ways.

There are currently no opentype implementations that I know of. And no
known shapers. rlig hopefully works with general shapers. But what what ot
features will be expected by script specific shaper is still an unknown.

On Saturday, 11 June 2016, Michael Everson <everson at evertype.com> wrote:
> On 11 Jun 2016, at 02:47, Andrew Cunningham <lang.support at gmail.com>
wrote:
>
>> It can be done via a ligature. It would have to be a required ligature.
Since other ligature types may or may not be enabled in various contexts.
And we dont want default substitution and mark positioning to generate a
non-ligature equivalent.
>
> Aren?t all of the number combinations required ligatures?
>
> Michael
>

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/4dfe0bb9/attachment.html>

From asmusf at ix.netcom.com  Sat Jun 11 02:08:00 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Sat, 11 Jun 2016 00:08:00 -0700
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
Message-ID: <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/c53bec6c/attachment.html>

From verdy_p at wanadoo.fr  Sat Jun 11 05:22:08 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sat, 11 Jun 2016 12:22:08 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
Message-ID: <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>

Exactly, Unicode should not create its own logic about scripts or numeral
systems.

All looks like the encoding of 10 as a pair (ONE+combining TENS) was a
severe conceptual error that could have been avoided by NOT encoding "TENS"
as combining but as a regular number/digit TEN usable isolately, and
forming a contectual ligature with a previous digit from TWO to NINE.

The encoding of 10 as (ONE+TENS) is superfluously needing an artificial
leading ONE. This is purely an Unicode construction, foreign to the logic
of the numeral system.


2016-06-11 9:08 GMT+02:00 Asmus Freytag (c) <asmusf at ix.netcom.com>:

> On 6/10/2016 5:34 PM, Andrew Cunningham wrote:
>
> There is the logic of how kikakui numbers are encoded in Unicode and there
> is the internal logic of the numeral system itself. They are not
> necessarily the same.
>
> This statement should be framed!
>
> A./
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/dc65d2ce/attachment.html>

From verdy_p at wanadoo.fr  Sat Jun 11 05:25:39 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sat, 11 Jun 2016 12:25:39 +0200
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
 <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
Message-ID: <CAGa7JC3sB=FD-__0QHZcJWhEemZxcYhgwR_TdOLU3tF3L0UjyQ@mail.gmail.com>

Note that this is most probably true for the encoding of 100 as
ONE+HUNDREDS, when HUNDREDS should be a regular number usable in isolation
without the leading ONE. Same thing about THOUSANDS and similar, all
encoded as combining characters; the name itself should not have taken the
plural.

I just hope they have combining class 0. Then the error is the assigned
general category C* which should have been N*.

Can we fix that so that isolated uses of TENS or HUNDREDS or others in the
series will NOT require any artificial leading digit ONE ?


2016-06-11 12:22 GMT+02:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> Exactly, Unicode should not create its own logic about scripts or numeral
> systems.
>
> All looks like the encoding of 10 as a pair (ONE+combining TENS) was a
> severe conceptual error that could have been avoided by NOT encoding "TENS"
> as combining but as a regular number/digit TEN usable isolately, and
> forming a contectual ligature with a previous digit from TWO to NINE.
>
> The encoding of 10 as (ONE+TENS) is superfluously needing an artificial
> leading ONE. This is purely an Unicode construction, foreign to the logic
> of the numeral system.
>
>
> 2016-06-11 9:08 GMT+02:00 Asmus Freytag (c) <asmusf at ix.netcom.com>:
>
>> On 6/10/2016 5:34 PM, Andrew Cunningham wrote:
>>
>> There is the logic of how kikakui numbers are encoded in Unicode and
>> there is the internal logic of the numeral system itself. They are not
>> necessarily the same.
>>
>> This statement should be framed!
>>
>> A./
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160611/105a3a77/attachment.html>

From charupdate at orange.fr  Sat Jun 11 17:12:57 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 12 Jun 2016 00:12:57 +0200 (CEST)
Subject: Mende Kikakui Number 10
In-Reply-To: <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
 <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
Message-ID: <1551343798.17257.1465683177287.JavaMail.www@wwinf1p21>

On Sat, 11 Jun 2016 12:25:39 +0200, Philippe Verdy wrote:
>
> Exactly, Unicode should not create its own logic about scripts or numeral systems.
>
> All looks like the encoding of 10 as a pair (ONE+combining TENS) was a severe 
> conceptual error that could have been avoided by NOT encoding "TENS" as combining 
> but as a regular number/digit TEN usable isolately, and forming a contectual 
> ligature with a previous digit from TWO to NINE.
>
> The encoding of 10 as (ONE+TENS) is superfluously needing an artificial leading 
> ONE. This is purely an Unicode construction, foreign to the logic of the numeral 
> system.
>


Seeing the discussion exhausted, I join my hope to Philippe Verdy?s, 
and reinforce by quoting Asmus Freytag on backcompat vs enhancement, 
before bringing another concern:

?If you add a feature to match behavior somewhere else, 
it rarely pays to make that perform "better", because 
it just means it's now different and no longer matches. 
The exception is a feature for which you can establish 
unambiguously that there is a metric of correctness or 
a widely (universally?) shared expectation by users 
as to the ideal behavior. In that case, being compatible 
with a broken feature (or a random implementation of one) 
may in fact be counter productive.?

http://www.unicode.org/mail-arch/unicode-ml/y2016-m03/0109.html

Being bound with stability guarantees, Unicode could eventually add a _new_

*1E8D7 MENDE KIKAKUI NUMBER TEN

Best wishes,

Marcel


From charupdate at orange.fr  Sat Jun 11 17:20:12 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 12 Jun 2016 00:20:12 +0200 (CEST)
Subject: Latin Letters Capital and Small Theta
Message-ID: <235625534.17310.1465683612359.JavaMail.www@wwinf1p21>

People are facing the recurrent idea that the Greek theta used to 
write the Rromani language in International Standard orthography?as 
well as a number of other languages?will be or ought to be encoded 
as a separate casing pair in Unicode.

LATIN CAPITAL LETTER THETA and LATIN SMALL LETTER THETA
were part of Michael Everson?s 2012 proposal at
http://www.unicode.org/L2/L2012/12138-n4262-unifon.pdf
as the intended code points U+A7B0 and U+A7B1. While some characters
were retained, others were rejected, among which the Latin Theta pair,
but no mention is found of this rejection in the Non-Approval Notices.

Two years later this proposal was sustained by
Denis Moyogo Jacquerye?s additional proposal at
http://www.unicode.org/L2/L2014/14202-latin-theta-delta.pdf
with a new rationale, as being required in writing systems of several
natural languages.

On the sole criterium of glyphic resemblance there exist already 
two matching characters in Unicode:
03F4 GREEK CAPITAL THETA SYMBOL
03B8 GREEK SMALL LETTER THETA

Does the UTC consider it as feasible to meet the issue by implementing 
a tailored casing pair for the related locales, and adding somewhere an
annotation for the information of font designers, or can people expect to
see one day a successful proposal for LATIN CAPITAL LETTER THETA and 
LATIN SMALL LETTER THETA? Yet to date, this is not found in the Pipeline. 
(Though experience showed that a given character being rejected in one 
proposal is without prejudice to its being accepted as a part of a later 
proposal. That happened to the LATIN CAPITAL LETTER SMALL CAPITAL I, found 
already in Mr Everson?s 2012 proposal and now added to Unicode in 2016.)

The Greek Theta as an IPA character was incidentally discussed already in 
the following thread:
Unicode Mail List Archive: gamma as a phonetic symbol. 
(Sat Sep 27 2008 - 11:43:57 CDT). Retrieved June 10, 2016, from 
http://www.unicode.org/mail-arch/unicode-ml/y2008-m09/0072.html

According to Mr Everson in this thread, ?Theta is perhaps the 
hardest to argue for? disunification:
http://www.unicode.org/mail-arch/unicode-ml/y2008-m09/0076.html

Why so, is however non-obvious to me because the capital does not 
match the glyphic expectations for the Romani International Standard 
Latin script subset as referred to in
https://en.wikipedia.org/wiki/Romani_alphabets#International_Standard
and more detailedly in
https://fr.wikipedia.org/wiki/Th%C3%AAta_latin
(available yet in French only, but anyway one might wish to check 
the picture).

Consequently AFAIK to date the Greek Capital Theta Symbol is preferred 
as uppercase, not the Greek Capital Theta. Using the Symbol variant
brings some odds in data processing due to the lack of round-trip casing 
relationship. This adds to the overall problem of cross-script usage. 
Using several scripts to write one language contradicts one of the design 
principles of Unicode.

I note too, that in its International Standard Alphabet form, Romany is not 
supported by the blocks up to Latin Extended-A, unlike TUS 8.0 states on 
page 296. This brings up the need to underscore that Unicode added the 
H with h??ek (U+021E U+021F) for Finnish Romany in the Latin Extended-B

block.

However U+03F4 ( ? ) GREEK CAPITAL THETA SYMBOL was among the 
subset of potentially obsolete characters found in the Archives of 
this List in the following e-mail:
http://www.unicode.org/mail-arch/unicode-ml/y2009-m01/0558.html

Solving this issue now is important in that the French Standard 
Keyboard Layout will support Rromani Standard Latin script (along 
with all European Latin script using languages). This topic being 
about plain character encoding, I?ve finally decided to submit it 
to your kind advice.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160612/13a9a9ad/attachment.html>

From doug at ewellic.org  Sat Jun 11 19:20:12 2016
From: doug at ewellic.org (Doug Ewell)
Date: Sat, 11 Jun 2016 18:20:12 -0600
Subject: Latin Letters Capital and Small Theta
Message-ID: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>

Marcel Schneider wrote:

> While some characters were retained, others were rejected, among which
> the Latin Theta pair, but no mention is found of this rejection in the
> Non-Approval Notices.

Lots of characters in proposals are rejected without rising to the level 
of explicit disapproval: "Look, we said NO, and don't ask us again." The 
Non-Approval Notices page starts with an extensive description of the 
difference.

At the same time, note that a few proposals, such as LATIN CAPITAL 
LETTER SHARP S, have risen phoenix-like from the ranks of 
non-approvaldom to become genuine encoded characters.

--
Doug Ewell | http://ewellic.org | Thornton, CO ???? 


From lang.support at gmail.com  Sat Jun 11 22:25:17 2016
From: lang.support at gmail.com (Andrew Cunningham)
Date: Sun, 12 Jun 2016 13:25:17 +1000
Subject: Mende Kikakui Number 10
In-Reply-To: <1551343798.17257.1465683177287.JavaMail.www@wwinf1p21>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
 <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
 <1551343798.17257.1465683177287.JavaMail.www@wwinf1p21>
Message-ID: <CAGJ7U-U=o4DqVNyO9hSxK9Dhr4gw+b6ToifxRiDxfRS=mjWcbA@mail.gmail.com>

Marcel, it isn't so much that the conversation was exhausted, rather that
the original question has been sufficienlty answered.

A.


On Sunday, 12 June 2016, Marcel Schneider <charupdate at orange.fr> wrote:
> On Sat, 11 Jun 2016 12:25:39 +0200, Philippe Verdy wrote:
>>
>> Exactly, Unicode should not create its own logic about scripts or
numeral systems.
>>
>> All looks like the encoding of 10 as a pair (ONE+combining TENS) was a
severe
>> conceptual error that could have been avoided by NOT encoding "TENS" as
combining
>> but as a regular number/digit TEN usable isolately, and forming a
contectual
>> ligature with a previous digit from TWO to NINE.
>>
>> The encoding of 10 as (ONE+TENS) is superfluously needing an artificial
leading
>> ONE. This is purely an Unicode construction, foreign to the logic of the
numeral
>> system.
>>
>
>
> Seeing the discussion exhausted, I join my hope to Philippe Verdy?s,
> and reinforce by quoting Asmus Freytag on backcompat vs enhancement,
> before bringing another concern:
>
> ?If you add a feature to match behavior somewhere else,
> it rarely pays to make that perform "better", because
> it just means it's now different and no longer matches.
> The exception is a feature for which you can establish
> unambiguously that there is a metric of correctness or
> a widely (universally?) shared expectation by users
> as to the ideal behavior. In that case, being compatible
> with a broken feature (or a random implementation of one)
> may in fact be counter productive.?
>
> http://www.unicode.org/mail-arch/unicode-ml/y2016-m03/0109.html
>
> Being bound with stability guarantees, Unicode could eventually add a
_new_
>
> *1E8D7 MENDE KIKAKUI NUMBER TEN
>
> Best wishes,
>
> Marcel
>
>

-- 
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160612/40c7c373/attachment.html>

From charupdate at orange.fr  Sun Jun 12 05:34:59 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 12 Jun 2016 12:34:59 +0200 (CEST)
Subject: Scheduling Public Reviews (was: Re: Mende Kikakui Number 10)
In-Reply-To: <CAGJ7U-U=o4DqVNyO9hSxK9Dhr4gw+b6ToifxRiDxfRS=mjWcbA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
 <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
 <1551343798.17257.1465683177287.JavaMail.www@wwinf1p21>
 <CAGJ7U-U=o4DqVNyO9hSxK9Dhr4gw+b6ToifxRiDxfRS=mjWcbA@mail.gmail.com>
Message-ID: <1746124551.3342.1465727699179.JavaMail.www@wwinf1g04>

On Sun, 12 Jun 2016 13:25:17 +1000, Andrew Cunningham wrote:

> Marcel, it isn't so much that the conversation was exhausted, rather that
> the original question has been sufficienlty answered. 

I understand the difference now. Anyway I didn?t consider the issue as settled.

More, the Mende Kikakui number encoding default as pointed in the original thread 
would IMHO have been far, far less likely to occur if the first public review would 
be scheduled at a more useful stage than the beta one. I don?t know if people feel 
being taken seriously when their feedback is solicited while almost all striking 
parameters are yet immutable. And I?guess that some of those who are ready to spend 
a part of their time for Unicode?s and character encoding?s sake, wouldn?t necessarily 
do so unless they are solicited and given the material in a handsome format.

Perhaps would a Public Alpha Review prevent things from running worse? Beta will be 
reviewed again of course, but in a less frustrating manner.

Marcel


From charupdate at orange.fr  Sun Jun 12 05:41:10 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Sun, 12 Jun 2016 12:41:10 +0200 (CEST)
Subject: Latin Letters Capital and Small Theta
In-Reply-To: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
References: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
Message-ID: <748151443.3396.1465728070818.JavaMail.www@wwinf1g04>

On Sat, 11 Jun 2016 18:20:12 -0600, Doug Ewell wrote:

> Marcel Schneider wrote:
> 
>> While some characters were retained, others were rejected, among which
>> the Latin Theta pair, but no mention is found of this rejection in the
>> Non-Approval Notices.
> 
> Lots of characters in proposals are rejected without rising to the level
> of explicit disapproval: "Look, we said NO, and don't ask us again." The
> Non-Approval Notices page starts with an extensive description of the
> difference.
> 
> At the same time, note that a few proposals, such as LATIN CAPITAL
> LETTER SHARP S, have risen phoenix-like from the ranks of
> non-approvaldom to become genuine encoded characters. 

Thank you for these hints, which moreover remember me the persistent case folding 
issue around the LATIN CAPITAL LETTER SHARP S which is not round-trip neither 
and needs to be tailored, because its small letter is to be kept stable. 
Seen from here, tailoring the Greek small theta to get it the Latin way around 
becomes quite obvious:

Without tailoring:
? ? ? ? SS
? ? ? ? ?

With tailoring:
? ? ? ? ?
? ? ? ? ?

Now I?m much likely to believe that theta-using Latin script writers are eventually 
better served with the UTC?s not retaining a separate Latin theta, because 
tailoring this custom Greek casing pair presumably makes for a more streamlined 
implementation and a more user-friendly result, given that the font issue is 
spared. The more as for units like Ohm and prefixes like micro, Greek letters are 
preferred too, no matter what script they are used in.

[BTW, on Sun, 12 Jun 2016 00:20:12 +0200 (CEST) I?wrote:
> According to Mr Everson in this thread, ?Theta is perhaps the
> hardest to argue for? disunification:
> http://www.unicode.org/mail-arch/unicode-ml/y2008-m09/0076.html
> 
> Why so, is however non-obvious to me [?]

But it is, since the thread was about IPA, so uppercase was not discussed.]


The question still remains whether in practice this working model could be 
unanimously accepted, given that at least somebody is preferring the regular 
Greek capital, presumably to get around the case folding issue.

Marcel


From daniel.buenzli at erratique.ch  Sun Jun 12 08:26:30 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Sun, 12 Jun 2016 14:26:30 +0100
Subject: 9.0.0 segmentation and line breaks on the empty string
Message-ID: <46731C3271204F11AEE86273A1BC9922@erratique.ch>

Hello,  

I notice that in 9.0.0, UAX29 segmentations no longer report boundaries on the empty string while UAX14 still does report a hard line break on it. Is this intended ? and what is the rationale behind these changes and non-changes ? 

While I think that the proposed UAX29 is a better one, these kind of changes on special cases make it easy to break assumptions made by client code so it would be better if these things do not change to often. Hence my request, shouldn't UAX14 also report no breaks on the empty string ? 

Best, 

Daniel


From asmusf at ix.netcom.com  Sun Jun 12 10:49:15 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Sun, 12 Jun 2016 08:49:15 -0700
Subject: compatibility features (was: Mende Kikakui Number 10)
In-Reply-To: <CAGJ7U-U=o4DqVNyO9hSxK9Dhr4gw+b6ToifxRiDxfRS=mjWcbA@mail.gmail.com>
References: <CAGJ7U-XCuGq1K9ki8YS92+0vMQ=cFwXYTP1aO=tT0tmbVnhaDA@mail.gmail.com>
 <CAGJ7U-XgwZB_fsyn9FcDawUfbpdM_S7pjDCy4P7eh=mG7mWiQg@mail.gmail.com>
 <575AD9E3.8060301@gmail.com>
 <CAGa7JC377FikiHDVDFC9+DhTaj6JtJp0oTbN_pBaTMjDafhwUA@mail.gmail.com>
 <CAGJ7U-Xku5Lg2Ub8pvniJ2Q=7K3S3s_Ef=ewfOhKKQCyrOoXBg@mail.gmail.com>
 <31AE4EE7-D2B3-492F-8CD6-E9333BCF1B5B@evertype.com>
 <9fa22860-7392-b9f5-52ce-7e5f16bd2942@att.net>
 <CAGJ7U-X63VqRhu6XOaGSEGtv-9AzTcJ0iEKa1iWDTj=q2zd6Yw@mail.gmail.com>
 <0ca74adc-d8b2-908c-c74f-2c10f2b8a354@ix.netcom.com>
 <CAGa7JC0TKDLC0JKq7RGqU3Xp0Ozeb_5qB1U+eLiVkmqZ9DmGjA@mail.gmail.com>
 <1551343798.17257.1465683177287.JavaMail.www@wwinf1p21>
 <CAGJ7U-U=o4DqVNyO9hSxK9Dhr4gw+b6ToifxRiDxfRS=mjWcbA@mail.gmail.com>
Message-ID: <098f04c6-a647-73e0-800d-74443b8905c5@ix.netcom.com>

On 6/11/2016 8:25 PM, Andrew Cunningham wrote:
> ?If you add a [compatibility] feature to match behavior
> > [found] somewhere else [not in the Unicode standard],
> > it rarely pays to make that perform "better", because
> > it just means it's now different and no longer matches
> > [the behavior to which it was supposed to be compatible].
>
> > The exception is a feature for which you can establish
> > unambiguously that there is a metric of correctness or
> > a widely (universally?) shared expectation by users
> > as to the ideal behavior. In that case, being compatible
> > with a broken feature (or a random implementation of one)
> > may in fact be counter productive.?
> >

In the case of Mende Kikakui methods for encoding number 10,
I don't see where the "compatibility" with an existing implementation
of that number system comes into play.

My statement was a warning to not add features for the sake
of "compatibility", but then to break that compatibility by making
the feature "better" - i.e. different.

You can have one, but not the other. Either a new (better/correct)
feature, or one that is compatible.

A./


From asmusf at ix.netcom.com  Sun Jun 12 10:59:30 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Sun, 12 Jun 2016 08:59:30 -0700
Subject: Latin Letters Capital and Small Theta
In-Reply-To: <748151443.3396.1465728070818.JavaMail.www@wwinf1g04>
References: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
 <748151443.3396.1465728070818.JavaMail.www@wwinf1g04>
Message-ID: <cfdc2add-e5a6-83e4-a3bf-ac4578d76839@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160612/19973130/attachment.html>

From frederic.grosshans at gmail.com  Mon Jun 13 07:41:18 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Mon, 13 Jun 2016 14:41:18 +0200
Subject: Latin Letters Capital and Small Theta
In-Reply-To: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
References: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
Message-ID: <575EA9EE.6080809@gmail.com>

Le 12/06/2016 02:20, Doug Ewell a ?crit :
> Marcel Schneider wrote:
>
>> While some characters were retained, others were rejected, among which
>> the Latin Theta pair, but no mention is found of this rejection in the
>> Non-Approval Notices.
>
> Lots of characters in proposals are rejected without rising to the 
> level of explicit disapproval: "Look, we said NO, and don't ask us 
> again." The Non-Approval Notices page starts with an extensive 
> description of the difference.
>
> At the same time, note that a few proposals, such as LATIN CAPITAL 
> LETTER SHARP S, have risen phoenix-like from the ranks of 
> non-approvaldom to become genuine encoded characters.
And,  if I I remember correctly, to proposal for the Latin letter theta 
yet has given example of the current usage of ttheta in latin 
orthography, like in Rromani 
(http://www.rromaniconnect.org/Rromanifonts.html, 
http://romani.humanities.manchester.ac.uk/whatis/status/codification.shtml 
). I guess a proposal based on the Rromani orthography, (and with input 
for the user community, of course!) would easily be accepted.

    Cheers,

         Fr?d?ric

From mark at macchiato.com  Mon Jun 13 08:04:10 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Mon, 13 Jun 2016 15:04:10 +0200
Subject: Latin Letters Capital and Small Theta
In-Reply-To: <cfdc2add-e5a6-83e4-a3bf-ac4578d76839@ix.netcom.com>
References: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
 <748151443.3396.1465728070818.JavaMail.www@wwinf1g04>
 <cfdc2add-e5a6-83e4-a3bf-ac4578d76839@ix.netcom.com>
Message-ID: <CAJ2xs_Fr5x53XwYLLrAaCzFtUgQB0+7mnBW8gRGsumHyT=LXxA@mail.gmail.com>

> such as URLs (domain names) there are restrictions that prevent
script-mixing in a single label.

That is just a current implementation restriction, based on only using the
Script property. Implementations upgraded to use Script_Extensions to test
for multiple scripts in a string can handle multiple scripts for a
character properly.

Mark

On Sun, Jun 12, 2016 at 5:59 PM, Asmus Freytag (c) <asmusf at ix.netcom.com>
wrote:

> Just a note: for any living(!) language, it is important that Unicode not
> mix scripts, but instead *disunify characters based on script*. The
> reason for that is that in important implementations, such as URLs (domain
> names) there are restrictions that prevent script-mixing in a single label.
>
> If an othography were to use characters from more than one script, it will
> not be supported for things like domain names (at least in certain zones),
> because doing so introduces an element of risk to the domain name system.
>
> This consideration is less relevant to dead languages and notational
> systems, because supporting them in URLs has never been a priority and
> users can live with a "best case" or partial solution. Living languages,
> esp. ones that have (or are excpected to achieve) significant literate
> populations, are a different matter.
>
> A./
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160613/fbd3b6ec/attachment.html>

From verdy_p at wanadoo.fr  Mon Jun 13 12:52:14 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Mon, 13 Jun 2016 19:52:14 +0200
Subject: Latin Letters Capital and Small Theta
In-Reply-To: <575EA9EE.6080809@gmail.com>
References: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
 <575EA9EE.6080809@gmail.com>
Message-ID: <CAGa7JC3HfDc3EJ5=aY_aLbsLAMTCateSe3Gt2s2U42cEjLs9FQ@mail.gmail.com>

This is general. characters may initially be encoded with a single case
where the demonstrated use for only IPA usage (which is single cased). To
get dual cased letters, we need to find examples of use in the orthography
of a language where all other letters are dual cased. Well this was tur for
the German sharp S but for long it was not demonstrated that the lowercase
and uppercase was different.

With Rromany (which has multiple orthographies in multiple scripts), the
problem is that there's no formal standard and the rromany communities
around countries have adapted their orthography with usages found in other
ntational languages. There's no real academy and in fact the language is
very fragmented, and its tradition is fact more oral than written There are
authors of written texts but each one has adopted a convention more or less
based on the standard orthography of another language where they live. So
there are variants of the orthography in multiple scripts, at least Latin,
Cyrillic, Greek, Devanagari (probably also Arabic in North-Eastern India,
Pakistan, Iran; many be also Georgian: the rromany people are spread in a
very large area from Southern Asia, Central Asia, Western Asia, to Europe
and North Africa). The orthographies are more or less adaptations of the
phonetics of the oral tradition.

For those authors that want to better represent the language phonetics it's
natural that they'll want to borrow the IPA theta symbol when chossing the
Latin script (and in the Greek-based orthography they'll correctly
differentiate the Greek Tau and Theta letters for the same purpose). I
wonder which letters they choose to differentiate Tau and Theta in Cyrillic
(there'a a sizeable rromany community in Bulgaria, Macedonia, Serbia...).
But in the Latin script, authors have also used digraphs (T vs. TH) since
long (just like other European languages, including English or French, even
if French does not differentiate the phonetics and the H in TH is in fact
completely mute!).

There's actually no stable translitterators because there are competing
orthographies depending on authors, and no formal agreements between
authors and no academic institution which is widely recognized (there are
severla local cummunities that may have authored some writing guides, but I
don't think these are very strong to be authoritative: the tradition is
still strongly oral and what is important is not the way the language is
written but how it is pronounced and sung: music and songs is an essential
part of the rromany culture, and what unites them across countries, even if
there are some religion splits).

It's normal for Unicode to accept the existence of Latin orthographies that
will use the Theta letter as a normal dual cased letter if we can
demonstrate that authors need it and publications were easily made and
relatively easy to find. Those publicatiosn are part of our wold cultures
and needs to be preserved and correctly represented, even if we don't have
any formal academy. It is even more important than encoding many new emojis
for fun (that are recent inventions but don't have the same level of
historic background).

Being able to write all languages even if their historic tradition is oral,
is an important and respectable goal, notably when these are living
languages with a large speaking community. It's not something new: various
native African languages have also adopted IPA symbols in their Latin
orthography, and wanted to have dual case. So now we also have dual-cased
Latin letters Alpha, Epsilon, Open O... It does not matter if IPA only
needs lowercase, but it has become a strong common base used for
orthographies of languages with oral traditions, and natural for them to
expand the IPA set with capital letters for the Latin script (and another
proof that IPA is not a separate script but a subset of the Latin script).


2016-06-13 14:41 GMT+02:00 Fr?d?ric Grosshans <frederic.grosshans at gmail.com>
:

> Le 12/06/2016 02:20, Doug Ewell a ?crit :
>
>> Marcel Schneider wrote:
>>
>> While some characters were retained, others were rejected, among which
>>> the Latin Theta pair, but no mention is found of this rejection in the
>>> Non-Approval Notices.
>>>
>>
>> Lots of characters in proposals are rejected without rising to the level
>> of explicit disapproval: "Look, we said NO, and don't ask us again." The
>> Non-Approval Notices page starts with an extensive description of the
>> difference.
>>
>> At the same time, note that a few proposals, such as LATIN CAPITAL LETTER
>> SHARP S, have risen phoenix-like from the ranks of non-approvaldom to
>> become genuine encoded characters.
>>
> And,  if I I remember correctly, to proposal for the Latin letter theta
> yet has given example of the current usage of ttheta in latin orthography,
> like in Rromani (http://www.rromaniconnect.org/Rromanifonts.html,
> http://romani.humanities.manchester.ac.uk/whatis/status/codification.shtml
> ). I guess a proposal based on the Rromani orthography, (and with input for
> the user community, of course!) would easily be accepted.
>
>    Cheers,
>
>         Fr?d?ric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160613/046afff7/attachment.html>

From rick at unicode.org  Wed Jun 15 21:34:10 2016
From: rick at unicode.org (Rick McGowan)
Date: Wed, 15 Jun 2016 19:34:10 -0700
Subject: Public review of draft repertoire for ISO/IEC 10646
Message-ID: <57621022.1070209@unicode.org>

The UTC would appreciate feedback on new repertoire that is currently 
under ballot for future additions to ISO/IEC 10646. This includes 
repertoire that has already been reviewed and approved by the UTC, but 
which will not be published until next year, as part of Version 10.0 of 
the Unicode Standard.

This is your opportunity to review the planned new repertoire for 
possible problems, and to make any suggestions you might have about 
improvements for glyphs or character names.

See PRI #327 <http://www.unicode.org/review/pri327/> and PRI #328 
<http://www.unicode.org/review/pri328/> for details on access to the 
draft repertoire documents for review, and for how to provide your 
feedback. The characters of interest -- the new repertoire under ballot 
-- are highlighted in yellow in the code charts in those documents. 
Glyph corrections or improvements in the charts are highlighted in a 
light blue.

Note that we already know about the mistaken glyph for the new character 
U+1D378 TALLY MARK FIVE, so you do not need to report that problem again!

Note also that a few of the characters for review in PRI #328, including 
the 72 new emoji characters, have been accelerated for publication in 
Unicode 9.0. The UTC will not be able to respond to further feedback on 
those 9.0 characters, which are already frozen for publication.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160615/cdc79716/attachment.html>

From gwalla at gmail.com  Thu Jun 16 02:09:03 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Thu, 16 Jun 2016 00:09:03 -0700
Subject: Public review of draft repertoire for ISO/IEC 10646
In-Reply-To: <57621022.1070209@unicode.org>
References: <57621022.1070209@unicode.org>
Message-ID: <CA+p4_H08cQky0SDrbKkTcwDCyFPkLch6W3MkW1XP=2HVAgskGA@mail.gmail.com>

I'm not sure if it merits formal feedback, but would it be a good idea to
cross reference IDEOGRAPHIC TALLY MARK FIVE to CJK UNIFIED IDEOGRAPH-6B63?
They are effectively visually identical (in fact I was under the impression
they were the same thing).

On Wed, Jun 15, 2016 at 7:34 PM, Rick McGowan <rick at unicode.org> wrote:

> The UTC would appreciate feedback on new repertoire that is currently
> under ballot for future additions to ISO/IEC 10646. This includes
> repertoire that has already been reviewed and approved by the UTC, but
> which will not be published until next year, as part of Version 10.0 of the
> Unicode Standard.
>
> This is your opportunity to review the planned new repertoire for possible
> problems, and to make any suggestions you might have about improvements for
> glyphs or character names.
>
> See PRI #327 <http://www.unicode.org/review/pri327/> and PRI #328
> <http://www.unicode.org/review/pri328/> for details on access to the
> draft repertoire documents for review, and for how to provide your
> feedback. The characters of interest -- the new repertoire under ballot --
> are highlighted in yellow in the code charts in those documents. Glyph
> corrections or improvements in the charts are highlighted in a light blue.
>
> Note that we already know about the mistaken glyph for the new character
> U+1D378 TALLY MARK FIVE, so you do not need to report that problem again!
> Note also that a few of the characters for review in PRI #328, including
> the 72 new emoji characters, have been accelerated for publication in
> Unicode 9.0. The UTC will not be able to respond to further feedback on
> those 9.0 characters, which are already frozen for publication.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160616/f2c65511/attachment.html>

From charupdate at orange.fr  Thu Jun 16 05:42:47 2016
From: charupdate at orange.fr (Marcel Schneider)
Date: Thu, 16 Jun 2016 12:42:47 +0200 (CEST)
Subject: Latin Letters Capital and Small Theta
In-Reply-To: <CAGa7JC3HfDc3EJ5=aY_aLbsLAMTCateSe3Gt2s2U42cEjLs9FQ@mail.gmail.com>
References: <6206A937BCA14C3C942388D7EFC9407A@DougEwell>
 <575EA9EE.6080809@gmail.com>
 <CAGa7JC3HfDc3EJ5=aY_aLbsLAMTCateSe3Gt2s2U42cEjLs9FQ@mail.gmail.com>
Message-ID: <2064827394.7993.1466073767346.JavaMail.www@wwinf1j27>

On Sun, 12 Jun 2016 08:59:30 -0700, Asmus Freytag (c) wrote:
> Just a note: for any living(!) language, it is important that Unicode not mix 
> scripts, but instead *disunify characters based on script.* The reason for that 
> is that in important implementations, such as URLs (domain names) there are 
> restrictions that prevent script-mixing in a single label.

On Mon, 13 Jun 2016 14:41:18 +0200, Fr?d?ric Grosshans wrote:
> Le 12/06/2016 02:20, Doug Ewell a ?crit :
>> Marcel Schneider wrote:
>>
>>> While some characters were retained, others were rejected, among which
>>> the Latin Theta pair, but no mention is found of this rejection in the
>>> Non-Approval Notices.
>>
>> Lots of characters in proposals are rejected without rising to the
>> level of explicit disapproval: "Look, we said NO, and don't ask us
>> again." The Non-Approval Notices page starts with an extensive
>> description of the difference.
>>
>> At the same time, note that a few proposals, such as LATIN CAPITAL
>> LETTER SHARP S, have risen phoenix-like from the ranks of
>> non-approvaldom to become genuine encoded characters.
> And, if I I remember correctly, to proposal for the Latin letter theta
> yet has given example of the current usage of ttheta in latin
> orthography, like in Rromani
> (http://www.rromaniconnect.org/Rromanifonts.html,
> http://romani.humanities.manchester.ac.uk/whatis/status/codification.shtml
> ). I guess a proposal based on the Rromani orthography, (and with input
> for the user community, of course!) would easily be accepted. 

On Mon, 13 Jun 2016 19:52:14 +0200, Philippe Verdy wrote:
> With Rromany (which has multiple orthographies in multiple scripts), the
> problem is that there's no formal standard and the rromany communities
> around countries have adapted their orthography with usages found in other
> ntational languages. There's no real academy and in fact the language is
> very fragmented, and its tradition is fact more oral than written There are
> authors of written texts but each one has adopted a convention more or less
> based on the standard orthography of another language where they live. [?]
> There's actually no stable translitterators because there are competing
> orthographies depending on authors, and no formal agreements between
> authors and no academic institution which is widely recognized [?]
> 
> It's normal for Unicode to accept the existence of Latin orthographies that
> will use the Theta letter as a normal dual cased letter if we can
> demonstrate that authors need it and publications were easily made and
> relatively easy to find. Those publicatiosn are part of our wold cultures
> and needs to be preserved and correctly represented, even if we don't have
> any formal academy. It is even more important than encoding many new emojis
> for fun (that are recent inventions but don't have the same level of
> historic background).
> 
> Being able to write all languages even if their historic tradition is oral,
> is an important and respectable goal, notably when these are living
> languages with a large speaking community. It's not something new: various
> native African languages have also adopted IPA symbols in their Latin
> orthography, and wanted to have dual case. So now we also have dual-cased
> Latin letters Alpha, Epsilon, Open O... It does not matter if IPA only
> needs lowercase, but it has become a strong common base used for
> orthographies of languages with oral traditions, and natural for them to
> expand the IPA set with capital letters for the Latin script (and another
> proof that IPA is not a separate script but a subset of the Latin script).

Thanks to all who responded in this thread. The challenge as I see it now 
is to spread the word and motivate persons who are in touch with the 
Rromani Standard Alphabet user communities, and are thus in a position 
to write up the proposal for Latin Letters Capital and Small Theta.

As of the subsequent font issue, I believe that it will be settled quickly 
by adding the new code points and duplicating the already existing glyphs of 
GREEK CAPITAL THETA SYMBOL and GREEK SMALL LETTER THETA.

Hopefully,

Marcel


From daniel.buenzli at erratique.ch  Sun Jun 19 08:25:44 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Sun, 19 Jun 2016 14:25:44 +0100
Subject: 9.0.0 segmentation and line breaks on the empty string
In-Reply-To: <46731C3271204F11AEE86273A1BC9922@erratique.ch>
References: <46731C3271204F11AEE86273A1BC9922@erratique.ch>
Message-ID: <B8BC4F25E61E4988A23CE496D89359F4@erratique.ch>

Le dimanche, 12 juin 2016 ? 14:26, Daniel B?nzli a ?crit :
> Hello,  
>  
> I notice that in 9.0.0, UAX29 segmentations no longer report boundaries on the empty string while UAX14 still does report a hard line break on it. Is this intended ? and what is the rationale behind these changes and non-changes ?  
>  
> While I think that the proposed UAX29 is a better one, these kind of changes on special cases make it easy to break assumptions made by client code so it would be better if these things do not change to often. Hence my request, shouldn't UAX14 also report no breaks on the empty string ?
I realize we are out of the beta review time. But do people think it would be worth raising for 10.0.0 ?  

Best,  

Daniel


From public at khwilliamson.com  Sun Jun 19 10:57:28 2016
From: public at khwilliamson.com (Karl Williamson)
Date: Sun, 19 Jun 2016 09:57:28 -0600
Subject: 9.0.0 segmentation and line breaks on the empty string
In-Reply-To: <B8BC4F25E61E4988A23CE496D89359F4@erratique.ch>
References: <46731C3271204F11AEE86273A1BC9922@erratique.ch>
 <B8BC4F25E61E4988A23CE496D89359F4@erratique.ch>
Message-ID: <5766C0E8.3050700@khwilliamson.com>

On 06/19/2016 07:25 AM, Daniel B?nzli wrote:
> Le dimanche, 12 juin 2016 ? 14:26, Daniel B?nzli a ?crit :
>> Hello,
>>
>> I notice that in 9.0.0, UAX29 segmentations no longer report boundaries on the empty string while UAX14 still does report a hard line break on it. Is this intended ? and what is the rationale behind these changes and non-changes ?
>>
>> While I think that the proposed UAX29 is a better one, these kind of changes on special cases make it easy to break assumptions made by client code so it would be better if these things do not change to often. Hence my request, shouldn't UAX14 also report no breaks on the empty string ?
> I realize we are out of the beta review time. But do people think it would be worth raising for 10.0.0 ?
>
> Best,
>
> Daniel
>
>

Yes.  Use http://www.unicode.org/reporting.html to make an error report. 
  I did this last year to report about the empty strings matching, and 
TR29 got changed for 9.0.  (Perhaps others reported it too.)  I was 
aware that the problem was also in TR14, but I don't remember now, I 
could very well have not included this in my submission.  And the 
Unicode personnel are busy people, and like me, can overlook things, and 
fail to draw logical inferences that, in retrospect, appear to be obvious.


From daniel.buenzli at erratique.ch  Sun Jun 19 11:34:08 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Sun, 19 Jun 2016 17:34:08 +0100
Subject: 9.0.0 segmentation and line breaks on the empty string
In-Reply-To: <5766C0E8.3050700@khwilliamson.com>
References: <46731C3271204F11AEE86273A1BC9922@erratique.ch>
 <B8BC4F25E61E4988A23CE496D89359F4@erratique.ch>
 <5766C0E8.3050700@khwilliamson.com>
Message-ID: <C61A097CFA55489B98615820E0E87C8D@erratique.ch>

Le dimanche, 19 juin 2016 ? 16:57, Karl Williamson a ?crit :
> Yes. Use http://www.unicode.org/reporting.html to make an error report.

Thanks, did that.

Best,  

Daniel


From andy.heninger at gmail.com  Mon Jun 20 17:32:12 2016
From: andy.heninger at gmail.com (Andy Heninger)
Date: Mon, 20 Jun 2016 15:32:12 -0700
Subject: 9.0.0 segmentation and line breaks on the empty string
In-Reply-To: <C61A097CFA55489B98615820E0E87C8D@erratique.ch>
References: <46731C3271204F11AEE86273A1BC9922@erratique.ch>
 <B8BC4F25E61E4988A23CE496D89359F4@erratique.ch>
 <5766C0E8.3050700@khwilliamson.com>
 <C61A097CFA55489B98615820E0E87C8D@erratique.ch>
Message-ID: <CAEtzAy5krkxO_f0LGcmJJPB15-E7EqMF==+L5j7iUsFT+0j2vQ@mail.gmail.com>

>
> I notice that in 9.0.0, UAX29 segmentations no longer report boundaries on
> the empty string while UAX14 still does


This is an interesting edge case.

My reading of UAX 14 is that an empty string would not produce a break.
Both "sot" and "eot" would be true, so LB2,
    sot ?
would match and apply, and that would be the end of the story. LB3 would
never be applied because LB2 would match first.

As to mandating a hard break at the end of text (LB3), I'm not at all sure
this was a good idea. It seems like the breaking behavior would depend on
the external context of the text, about which the LB algorithm knows
nothing. It's different from having text that ends ends with a LF or other
hard-break character. But I'm also disinclined to suggest changes in this
area; the possibility of breaking applications that have come to expect the
existing behavior seems real, and it's all edge cases.

  -- Andy

On Sun, Jun 19, 2016 at 9:34 AM, Daniel B?nzli <daniel.buenzli at erratique.ch>
wrote:

> Le dimanche, 19 juin 2016 ? 16:57, Karl Williamson a ?crit :
> > Yes. Use http://www.unicode.org/reporting.html to make an error report.
>
> Thanks, did that.
>
> Best,
>
> Daniel
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160620/7bfbfe28/attachment.html>

From daniel.buenzli at erratique.ch  Mon Jun 20 17:49:12 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Mon, 20 Jun 2016 23:49:12 +0100
Subject: 9.0.0 segmentation and line breaks on the empty string
In-Reply-To: <CAEtzAy5krkxO_f0LGcmJJPB15-E7EqMF==+L5j7iUsFT+0j2vQ@mail.gmail.com>
References: <46731C3271204F11AEE86273A1BC9922@erratique.ch>
 <B8BC4F25E61E4988A23CE496D89359F4@erratique.ch>
 <5766C0E8.3050700@khwilliamson.com>
 <C61A097CFA55489B98615820E0E87C8D@erratique.ch>
 <CAEtzAy5krkxO_f0LGcmJJPB15-E7EqMF==+L5j7iUsFT+0j2vQ@mail.gmail.com>
Message-ID: <85B3F496F90E4D8E91A1F132C29493BA@erratique.ch>

Le lundi, 20 juin 2016 ? 23:32, Andy Heninger a ?crit :
> My reading of UAX 14 is that an empty string would not produce a break. Both "sot" and "eot" would be true, so LB2, sot ? would match and apply, and that would be the end of the story.  

Uh. I just checked my own implementation and that's actually what happens (I actually even have a test for this?). I guess I read the clarifications of UAX29 and wrongly remembered the rules were the same on the empty string in UAX 14.

So maybe take my report as a request for clarification?

Thanks for the answer and sorry for the noise,  

Daniel


From doug at ewellic.org  Tue Jun 21 09:43:34 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 21 Jun 2016 07:43:34 -0700
Subject: Release =?UTF-8?Q?date=3F?=
Message-ID: <20160621074334.665a7a7059d7ee80bb4d670165c8327d.5132cfa6a7.wbe@email03.godaddy.com>

http://opiniojuris.org/2016/06/20/emojis-and-international-law

"And tomorrow, June 21, we will have 71 new emojis to play with."

Do only bloggers and the press get notified in advance of the release
date of Unicode 9.0?

--
Doug Ewell | http://ewellic.org | Thornton, CO ????


From kenwhistler at att.net  Tue Jun 21 10:03:40 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Tue, 21 Jun 2016 08:03:40 -0700
Subject: Release date?
In-Reply-To: <20160621074334.665a7a7059d7ee80bb4d670165c8327d.5132cfa6a7.wbe@email03.godaddy.com>
References: <20160621074334.665a7a7059d7ee80bb4d670165c8327d.5132cfa6a7.wbe@email03.godaddy.com>
Message-ID: <c6332d80-ce9a-a565-3cc4-c9468ecac74e@att.net>

Doug,


On 6/21/2016 7:43 AM, Doug Ewell wrote:
> "And tomorrow, June 21, we will have 71 new emojis to play with."
>
> Do only bloggers and the press get notified in advance of the release
> date of Unicode 9.0?

They are getting it from the same place all of the members and anybody else
could have been seeing it, the draft 9.0.0 landing page we've had up for 
months
as part of the beta review:

http://www.unicode.org/versions/Unicode9.0.0/

That used to just say, June 2016, but then we got more explicit when 
June got
closer, and we could plan to an exact date.

The page was also easily accessible until a couple of days ago, when we 
took down
the link to the old 9.0.0 beta review page in preparation for the actual 
release.

Oh, and for several days now, we've been tweeting that the release is 
imminent.

In fact... wait a few hours, and it will be here... ;-)

--Ken

From jaruga at redhat.com  Tue Jun 21 08:54:34 2016
From: jaruga at redhat.com (Jun Aruga)
Date: Tue, 21 Jun 2016 09:54:34 -0400 (EDT)
Subject: The license for Unihan v1.1
In-Reply-To: <975764717.1628778.1466514955212.JavaMail.zimbra@redhat.com>
Message-ID: <1169826287.1673685.1466517274485.JavaMail.zimbra@redhat.com>

Hello,

I would like to ask you about the license of the old version Unicode mapping data v1.1.

I am developing for Ruby package.
And there are files in the Ruby package with Unihan GB12345-90 (GB12345-80), GB2312-80 Unicode version 1.1. [1][2]
I would like to ask which license should be used for these files.

When I checked latest version Unihan.zip [3] of the Unicode mapping, and the ReadMe.txt [4], I found "Unicode Character Database (UCD)" was used.
However I checked the directory in version 1.1. [1], it seems that Unihan data has been disappeared from the web site directory. [5]
It seems that "Unicode license" had been used for that.

So, do you have any idea about which license I should use?

Thanks.


[1] https://github.com/ruby/ruby/tree/ruby_2_3/enc/trans/GB
    4 files in this directory.
[2] http://unicode.org/reports/tr38/
[3] http://www.unicode.org/Public/9.0.0/ucd/Unihan.zip
[4] http://www.unicode.org/Public/9.0.0/ucd/ReadMe.txt
    > # Unicode Character Database
[5] http://www.unicode.org/Public/1.1-Update/

Jun Aruga

From doug at ewellic.org  Tue Jun 21 10:18:30 2016
From: doug at ewellic.org (Doug Ewell)
Date: Tue, 21 Jun 2016 08:18:30 -0700
Subject: Release =?UTF-8?Q?date=3F?=
Message-ID: <20160621081830.665a7a7059d7ee80bb4d670165c8327d.b4c0de5e60.wbe@email03.godaddy.com>

Ken Whistler wrote:

>> Do only bloggers and the press get notified in advance of the release
>> date of Unicode 9.0?
>
> They are getting it from the same place all of the members and anybody
> else could have been seeing it, the draft 9.0.0 landing page we've had
> up for months as part of the beta review:
>
> http://www.unicode.org/versions/Unicode9.0.0/
>
> That used to just say, June 2016, but then we got more explicit when
> June got closer, and we could plan to an exact date.

Sorry, I wasn't aware that page was being updated with new date
information. My apologies.

--
Doug Ewell | http://ewellic.org | Thornton, CO


From public at khwilliamson.com  Tue Jun 21 10:14:55 2016
From: public at khwilliamson.com (Karl Williamson)
Date: Tue, 21 Jun 2016 09:14:55 -0600
Subject: Release date?
In-Reply-To: <20160621074334.665a7a7059d7ee80bb4d670165c8327d.5132cfa6a7.wbe@email03.godaddy.com>
References: <20160621074334.665a7a7059d7ee80bb4d670165c8327d.5132cfa6a7.wbe@email03.godaddy.com>
Message-ID: <576959EF.1000402@khwilliamson.com>

On 06/21/2016 08:43 AM, Doug Ewell wrote:
> http://opiniojuris.org/2016/06/20/emojis-and-international-law
>
> "And tomorrow, June 21, we will have 71 new emojis to play with."
>
> Do only bloggers and the press get notified in advance of the release
> date of Unicode 9.0?

http://www.unicode.org/versions/Unicode9.0.0/


From daniel.buenzli at erratique.ch  Tue Jun 21 11:02:15 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Tue, 21 Jun 2016 17:02:15 +0100
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments
Message-ID: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>

I have a few questions/comments about the new emoji segmentation rules in 9.0.0

1. I have trouble understanding what the ^ symbol means in these rules:  

http://www.unicode.org/reports/tr29/proposed.html#GB8a
http://www.unicode.org/reports/tr29/proposed.html#WB15

does it correspond to the regexp SOL symbol ? If that is the case SOL is a bit ambiguous in that context it could also mean that you need to match start of lines which is a whole different business. Couldn't that simply be replaced by sot ?  

2. Besides given that with GB8* rules you need to be able to count an odd number of RI, it seems to me that the sentence "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters." is no longer accurate.

3. There are two rules named GB8c.

4. In ?1.1 the link to UTS18 is broken (#RegEx does not exist in UAX 41).  

Best,  

Daniel  


From daniel.buenzli at erratique.ch  Tue Jun 21 16:19:31 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Tue, 21 Jun 2016 22:19:31 +0100
Subject: UAX29 9.0.0 Grapheme cluster spec & test discrepancy
Message-ID: <3E23D9AA9EF7402BA96D1D8D14579CEF@erratique.ch>

Hello,

It seems there's a discrepancy between the tests and the spec for grapheme clusters. In

 http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt  

we have:  

? 261D ? 0308 ? 1F3FB ?  
# ? [0.2] WHITE UP POINTING INDEX (E_Base)  
# ? [9.0] COMBINING DIAERESIS (Extend)  
# ? [10.0] EMOJI MODIFIER FITZPATRICK TYPE-1-2 (E_Modifier) ? [0.3]

which is  

 http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.html#r10.0

but the spec doesn't talk about interleaved Extend*:  

 http://www.unicode.org/reports/tr29/proposed.html#GB10

It seems following the spec this would be:  

? 261D ? 0308 ? 1F3FB ?

which one is right ?

Best,  

Daniel


From liancu at microsoft.com  Tue Jun 21 19:32:08 2016
From: liancu at microsoft.com (Laurentiu Iancu)
Date: Wed, 22 Jun 2016 00:32:08 +0000
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments
In-Reply-To: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>
References: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>
Message-ID: <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>

Hello,


Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual regex notation, and the corresponding rules could use sot instead.


Re #2, that was an oversight, and will be addressed in the Proposed Update of UAX #29 for Unicode 10.0.


Re #3 and #4, both were addressed before the release of Version 9.0.


For suggestions such as #1, which require review by the UTC, please remember to use the feedback reporting form.


Thank you,

L.


-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Daniel B?nzli
Sent: Tuesday, June 21, 2016 9:02 AM
To: Unicode Public <unicode at unicode.org>
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments


I have a few questions/comments about the new emoji segmentation rules in 9.0.0


1. I have trouble understanding what the ^ symbol means in these rules:


http://www.unicode.org/reports/tr29/proposed.html#GB8a

http://www.unicode.org/reports/tr29/proposed.html#WB15


does it correspond to the regexp SOL symbol ? If that is the case SOL is a bit ambiguous in that context it could also mean that you need to match start of lines which is a whole different business. Couldn't that simply be replaced by sot ?


2. Besides given that with GB8* rules you need to be able to count an odd number of RI, it seems to me that the sentence "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters." is no longer accurate.


3. There are two rules named GB8c.


4. In ?1.1 the link to UTS18 is broken (#RegEx does not exist in UAX 41).


Best,


Daniel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160622/fda03923/attachment.html>

From liancu at microsoft.com  Tue Jun 21 19:32:46 2016
From: liancu at microsoft.com (Laurentiu Iancu)
Date: Wed, 22 Jun 2016 00:32:46 +0000
Subject: UAX29 9.0.0 Grapheme cluster spec & test discrepancy
In-Reply-To: <3E23D9AA9EF7402BA96D1D8D14579CEF@erratique.ch>
References: <3E23D9AA9EF7402BA96D1D8D14579CEF@erratique.ch>
Message-ID: <CY1PR0301MB199469A0DB17B43EFEFA0EB4DD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>

Hello,


This discrepancy was addressed during the release process.  Please refer to the published Version 9.0 of UAX #29 and the UCD files.


Regards,

L.


-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Daniel B?nzli
Sent: Tuesday, June 21, 2016 2:20 PM
To: Unicode at unicode.org
Subject: UAX29 9.0.0 Grapheme cluster spec & test discrepancy


Hello,


It seems there's a discrepancy between the tests and the spec for grapheme clusters. In


http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt


we have:


? 261D ? 0308 ? 1F3FB ?

# ? [0.2] WHITE UP POINTING INDEX (E_Base) # ? [9.0] COMBINING DIAERESIS (Extend) # ? [10.0] EMOJI MODIFIER FITZPATRICK TYPE-1-2 (E_Modifier) ? [0.3]


which is


http://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakTest.html#r10.0


but the spec doesn't talk about interleaved Extend*:


http://www.unicode.org/reports/tr29/proposed.html#GB10


It seems following the spec this would be:


? 261D ? 0308 ? 1F3FB ?


which one is right ?


Best,


Daniel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160622/3a8322cd/attachment.html>

From daniel.buenzli at erratique.ch  Wed Jun 22 04:22:21 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 22 Jun 2016 10:22:21 +0100
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments
In-Reply-To: <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>
References: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>
 <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>
Message-ID: <1945B5EB463B4B62ABDBFF876C3FB169@erratique.ch>

Thanks for the answers Laurentiu.

> For suggestions such as #1, which require review by the UTC, please remember to use the feedback reporting form.
Will do ? I always prefer to first check my understanding with the list to avoid making bogus reports.  

Best,  

Daniel


From verdy_p at wanadoo.fr  Wed Jun 22 05:33:57 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Wed, 22 Jun 2016 12:33:57 +0200
Subject: Announcing The Unicode(R) Standard, Version 9.0
Message-ID: <CAGa7JC0HSiW+HtQYwiF9TiNYvh0YnvCCAep9ku=iHxp2PpRDKA@mail.gmail.com>

2016-06-22 0:02 GMT+02:00 <announcements at unicode.org>:

> Important symbol additions include:
>
>    - 19 symbols for the new 4K TV standard
>
> We were told that this standad  is not named "4K" but "UltraHD" (UHD)...
"4K" is just a popular informal term in English medias, or used in
commercial announcements, here also in English. It is not correctly
understood everywhere, or would lead to confusion about the required
conformance level

[Basically, this does not just include a minimum resolution but also a set
of encoding technologies, support for encryption, support for several
protocols -- including support for UTF-8 as this standard is now based on
web standards -- and no longer requires the MPEG envelope, but will rather
use streaming over IP. For broadcasting, it also includes a new signal
format requiring a new hardware tuner and demultiplexer and channels will
transport more than just audio and videos, and will also have dynmically
changing parameters (resolution, color planes, supplementary planes for
stereoscopic 3D, supplentary streams for 5.1 sound, possibility of reducing
the bandwidth usage dynamically for some programs, so that channel
producers can negociate their mutual bandwidth need on the multiplex
support, and add/remove supplementary streams, including for advertzing, or
for renewing usage rights to the authorized subscribers with conforming
devices... All this is also supported on the new DVB-T2 standard for
broadcasting, but the format is designed to be transportable as well over
various networking media, including fiber, DSL, mobile internet, or relayed
over VLANs. For "4K" resolution, the requirement on devices is not just on
the tuner or demuxer, but also in terms of minimum performance level for
the codec which will also support secondary streams for error corrections,
possibly via other connections, such as correcting a received broadcast
using a separate Internet access, which may also be used to negociate and
renew decryption keys for paid programs.]

The UltraHD logo (for use on sold products) is set accordingly (and already
there's another DVB-T2 logo for hardware decoders that are still not ready
for UltraHD, but may be eligible later via firmware updates, because
existing DVB-T tuners will not be able to decode the signal even if they
support the necessary codecs and are able to display the 4K resolution).
For cable decoders or "boxes" propsoed by ISP, there are separate
specifications, but they are controled by the ISP. However they will
support the UltraHD streams and will implement the necessary virtual
networking interfaces in their router. For mobile devices, this will
support as long as you have the support for the 4G/5G network, the rest
will be a driver update or will be supported by installable apps, but the
rendering capabilities will be limited by the GPU and screen hardware.

Anyway, aren't all these logos (not "4K", but "UltraHD" and
"DVB-T"/"DVB-T2") protected by IP rights (with specific rules about their
conforming usage, and a design for the shapes) ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160622/07fff346/attachment.html>

From daniel.buenzli at erratique.ch  Wed Jun 22 06:10:30 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 22 Jun 2016 12:10:30 +0100
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments
In-Reply-To: <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>
References: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>
 <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>
Message-ID: <A774529969B6440B9E10E92D5D1D0FC6@erratique.ch>


Le mercredi, 22 juin 2016 ? 01:32, Laurentiu Iancu a ?crit :
> Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual regex notation, and the corresponding rules could use sot instead.

By the way it seems to me that an equivalent formulation of GB12/GB13 and WB15/WB16 would be to have the sequence of rules:

RI RI ? RI RI
RI x RI

This fits particularly well in the case of word breaking since you already need as much context as this because of the rules WB{6,7,11,12}. It also avoids regexps and negation.

Best,  

Daniel


From mark at macchiato.com  Wed Jun 22 06:32:43 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Wed, 22 Jun 2016 13:32:43 +0200
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments
In-Reply-To: <A774529969B6440B9E10E92D5D1D0FC6@erratique.ch>
References: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>
 <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>
 <A774529969B6440B9E10E92D5D1D0FC6@erratique.ch>
Message-ID: <CAJ2xs_GjX3oSPLDArn+jTJQkwn1BS+c6pG_B_jHH6JWgT=m=OQ@mail.gmail.com>

That wouldn't work. The process works by taking each offset, and walking
through all the rules, using the first one that matches.

So with your rules and the following input:

RI RI RI RI RI RI

You'd get that any offset with at least 2 RI on the right and on the left
would have no break, and every thing else would have a break, thus:

RI x RI ? RI ? RI ? RI x RI


Mark

On Wed, Jun 22, 2016 at 1:10 PM, Daniel B?nzli <daniel.buenzli at erratique.ch>
wrote:

>
> Le mercredi, 22 juin 2016 ? 01:32, Laurentiu Iancu a ?crit :
> > Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual
> regex notation, and the corresponding rules could use sot instead.
>
> By the way it seems to me that an equivalent formulation of GB12/GB13 and
> WB15/WB16 would be to have the sequence of rules:
>
> RI RI ? RI RI
> RI x RI
>
> This fits particularly well in the case of word breaking since you already
> need as much context as this because of the rules WB{6,7,11,12}. It also
> avoids regexps and negation.
>
> Best,
>
> Daniel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160622/b126ceed/attachment.html>

From daniel.buenzli at erratique.ch  Wed Jun 22 06:54:28 2016
From: daniel.buenzli at erratique.ch (=?utf-8?Q?Daniel_B=C3=BCnzli?=)
Date: Wed, 22 Jun 2016 12:54:28 +0100
Subject: UAX 29 9.0.0 new emoji flag rules questions and comments
In-Reply-To: <CAJ2xs_GjX3oSPLDArn+jTJQkwn1BS+c6pG_B_jHH6JWgT=m=OQ@mail.gmail.com>
References: <E8B8FEF093EB4288882FE9F17B89E745@erratique.ch>
 <CY1PR0301MB1994772BA9200CAF349CEA6EDD2C0@CY1PR0301MB1994.namprd03.prod.outlook.com>
 <A774529969B6440B9E10E92D5D1D0FC6@erratique.ch>
 <CAJ2xs_GjX3oSPLDArn+jTJQkwn1BS+c6pG_B_jHH6JWgT=m=OQ@mail.gmail.com>
Message-ID: <E46C746C17F34A4F97B82AE67EE388DB@erratique.ch>

Le mercredi, 22 juin 2016 ? 12:32, Mark Davis ?? a ?crit :
> That wouldn't work.

Ah yes indeed. You'd need to be able to remember which previous boundary decisions that were taken. I.e. have rules of the form:

RI x RI ? RI RI

Thanks,

Daniel


From kenwhistler at att.net  Wed Jun 22 10:06:00 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Wed, 22 Jun 2016 08:06:00 -0700
Subject: Announcing The Unicode(R) Standard, Version 9.0
In-Reply-To: <CAGa7JC0HSiW+HtQYwiF9TiNYvh0YnvCCAep9ku=iHxp2PpRDKA@mail.gmail.com>
References: <CAGa7JC0HSiW+HtQYwiF9TiNYvh0YnvCCAep9ku=iHxp2PpRDKA@mail.gmail.com>
Message-ID: <3839938e-397b-1920-88e6-f14fee5783e1@att.net>


On 6/22/2016 3:33 AM, Philippe Verdy wrote:
>
>
> 2016-06-22 0:02 GMT+02:00 <announcements at unicode.org 
> <mailto:announcements at unicode.org>>:
>
>     Important symbol additions include:
>
>       * 19 symbols for the new 4K TV standard
>
> We were told that this standad  is not named "4K" but "UltraHD" 
> (UHD)... "4K" is just a popular informal term in English medias, or 
> used in commercial announcements, here also in English. It is not 
> correctly understood everywhere, or would lead to confusion about the 
> required conformance level
>
>
... [verbose explanation of the standard] ...

>
> Anyway, aren't all these logos (not "4K", but "UltraHD" and 
> "DVB-T"/"DVB-T2") protected by IP rights (with specific rules about 
> their conforming usage, and a design for the shapes) ?
>

The characters in question are correctly identified as from the "ARIB 
STD B62" in the 9.0 code charts. We recognize that "4K" is just a 
shorthand term for the standard in question.

And while there may be specific rules for conforming usage in actual 
television implementation, the symbols in question were urgently 
requested by the Japanese National Body for inclusion in 10646 and the 
Unicode Standard, precisely to ensure Unicode *character-based* 
interchange and interoperability. See:

http://www.unicode.org/L2/L2015/15238-n4671.pdf

 From that document: "Therefore, it is highly expected that the 
additional symbols in ARIB STD-B62 are safely interchanged via UCS."

These 19 symbols were accelerated for publication in Unicode 9.0 to 
ensure their availability for implementations as of 2016.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160622/8bec6357/attachment.html>

From c933103 at gmail.com  Wed Jun 22 22:54:16 2016
From: c933103 at gmail.com (gfb hjjhjh)
Date: Thu, 23 Jun 2016 11:54:16 +0800
Subject: Announcing The Unicode(R) Standard, Version 9.0
In-Reply-To: <CAGa7JC0HSiW+HtQYwiF9TiNYvh0YnvCCAep9ku=iHxp2PpRDKA@mail.gmail.com>
References: <CAGa7JC0HSiW+HtQYwiF9TiNYvh0YnvCCAep9ku=iHxp2PpRDKA@mail.gmail.com>
Message-ID: <CAGHjPPK+AZxMLBz1x8PeFFCSYc8zmCVTvtMb5k5u_pqW9XEofQ@mail.gmail.com>

>From what I understand, these symbols are from Japanese Broadcasting
Standard and I do see Japanese government use 4K in their official
documents which probably explained the naming.
https://www.google.co.jp/search?q=4k+site%3A.go.jp&oq=4k+site%3A.go.jp
2016/06/22 18:37 "Philippe Verdy" <verdy_p at wanadoo.fr>:

>
>
> 2016-06-22 0:02 GMT+02:00 <announcements at unicode.org>:
>
>> Important symbol additions include:
>>
>>    - 19 symbols for the new 4K TV standard
>>
>> We were told that this standad  is not named "4K" but "UltraHD" (UHD)...
> "4K" is just a popular informal term in English medias, or used in
> commercial announcements, here also in English. It is not correctly
> understood everywhere, or would lead to confusion about the required
> conformance level
>
> [Basically, this does not just include a minimum resolution but also a set
> of encoding technologies, support for encryption, support for several
> protocols -- including support for UTF-8 as this standard is now based on
> web standards -- and no longer requires the MPEG envelope, but will rather
> use streaming over IP. For broadcasting, it also includes a new signal
> format requiring a new hardware tuner and demultiplexer and channels will
> transport more than just audio and videos, and will also have dynmically
> changing parameters (resolution, color planes, supplementary planes for
> stereoscopic 3D, supplentary streams for 5.1 sound, possibility of reducing
> the bandwidth usage dynamically for some programs, so that channel
> producers can negociate their mutual bandwidth need on the multiplex
> support, and add/remove supplementary streams, including for advertzing, or
> for renewing usage rights to the authorized subscribers with conforming
> devices... All this is also supported on the new DVB-T2 standard for
> broadcasting, but the format is designed to be transportable as well over
> various networking media, including fiber, DSL, mobile internet, or relayed
> over VLANs. For "4K" resolution, the requirement on devices is not just on
> the tuner or demuxer, but also in terms of minimum performance level for
> the codec which will also support secondary streams for error corrections,
> possibly via other connections, such as correcting a received broadcast
> using a separate Internet access, which may also be used to negociate and
> renew decryption keys for paid programs.]
>
> The UltraHD logo (for use on sold products) is set accordingly (and
> already there's another DVB-T2 logo for hardware decoders that are still
> not ready for UltraHD, but may be eligible later via firmware updates,
> because existing DVB-T tuners will not be able to decode the signal even if
> they support the necessary codecs and are able to display the 4K
> resolution). For cable decoders or "boxes" propsoed by ISP, there are
> separate specifications, but they are controled by the ISP. However they
> will support the UltraHD streams and will implement the necessary virtual
> networking interfaces in their router. For mobile devices, this will
> support as long as you have the support for the 4G/5G network, the rest
> will be a driver update or will be supported by installable apps, but the
> rendering capabilities will be limited by the GPU and screen hardware.
>
> Anyway, aren't all these logos (not "4K", but "UltraHD" and
> "DVB-T"/"DVB-T2") protected by IP rights (with specific rules about their
> conforming usage, and a design for the shapes) ?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/8d8727f2/attachment.html>

From otto.stolz at uni-konstanz.de  Thu Jun 23 06:50:38 2016
From: otto.stolz at uni-konstanz.de (Otto Stolz)
Date: Thu, 23 Jun 2016 13:50:38 +0200
Subject: =?UTF-8?Q?Re:_Announcing_The_Unicode=c2=ae_Standard=2c_Version_9.0?=
In-Reply-To: <5769B96E.5040804@unicode.org>
References: <5769B96E.5040804@unicode.org>
Message-ID: <576BCD0E.7030605@uni-konstanz.de>

Ciao,

il 2016-06-22 alle 00:02 announcements at unicode.org ha scritto:
> Version 9.0 of the Unicode Standard is now available.
?
> MOTOR SCOOTER

Almost exactly 70 years after its invention, ?la vespa? has
found her way into Unicode. I have related that important news,
immediately, to the members of my Italian language class ;-)

Auguri,
   Otto


From ken.shirriff at gmail.com  Thu Jun 23 14:53:17 2016
From: ken.shirriff at gmail.com (Ken Shirriff)
Date: Thu, 23 Jun 2016 12:53:17 -0700
Subject: Adding half-star to Unicode?
Message-ID: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>

Half-stars are used all over the place for reviews and many people have
expressed interest in a Unicode half star. I propose two new Unicode
characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
half star without and with an outline. What do you think? Is there any
reason Unicode doesn't have a half star?

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/de2aea61/attachment.html>

From gwalla at gmail.com  Thu Jun 23 16:34:40 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Thu, 23 Jun 2016 14:34:40 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
Message-ID: <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>

On Thu, Jun 23, 2016 at 12:53 PM, Ken Shirriff <ken.shirriff at gmail.com>
wrote:

> Half-stars are used all over the place for reviews and many people have
> expressed interest in a Unicode half star. I propose two new Unicode
> characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
> half star without and with an outline. What do you think? Is there any
> reason Unicode doesn't have a half star?
>
> Ken
>

Ratings are usually sequences of stars, with any half star coming at the
end, like ???(half), AIUI, so it's usually the left side that's black. But
what about in right-to-left contexts? Would they be bidi-mirrored?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/639cdd07/attachment.html>

From verdy_p at wanadoo.fr  Thu Jun 23 16:44:10 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 23 Jun 2016 23:44:10 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
Message-ID: <CAGa7JC2cEb-2n_U-69-gJ-toBgK1DHgb7JvNJqaQWBYXBcJizA@mail.gmail.com>

Only one of the two would be enough: the existing **full** white star (?),
but only half filled.
However a second one may be needed for RTL : half filling may be done on
the left or right side of the white star.
This would then be WHITE STAR WITH LEFT HALF BLACK, WHITE STAR WITH RIGHT
HALF BLACK
Possibly we may also need top and bottom half filling (for vertical written
scripts, top to bottom, or bottom to top)


2016-06-23 21:53 GMT+02:00 Ken Shirriff <ken.shirriff at gmail.com>:

> Half-stars are used all over the place for reviews and many people have
> expressed interest in a Unicode half star. I propose two new Unicode
> characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
> half star without and with an outline. What do you think? Is there any
> reason Unicode doesn't have a half star?
>
> Ken
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/4e1c60be/attachment.html>

From verdy_p at wanadoo.fr  Thu Jun 23 16:46:36 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Thu, 23 Jun 2016 23:46:36 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
Message-ID: <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>

You're right, mirroring for RTL, and vertical presentation may avoid
creating 4 characters, only one would then be needed: HALF-BLACK WHITE STAR
...

2016-06-23 23:34 GMT+02:00 Garth Wallace <gwalla at gmail.com>:

> On Thu, Jun 23, 2016 at 12:53 PM, Ken Shirriff <ken.shirriff at gmail.com>
> wrote:
>
>> Half-stars are used all over the place for reviews and many people have
>> expressed interest in a Unicode half star. I propose two new Unicode
>> characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
>> half star without and with an outline. What do you think? Is there any
>> reason Unicode doesn't have a half star?
>>
>> Ken
>>
>
> Ratings are usually sequences of stars, with any half star coming at the
> end, like ???(half), AIUI, so it's usually the left side that's black.
> But what about in right-to-left contexts? Would they be bidi-mirrored?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/03e58d88/attachment.html>

From gwalla at gmail.com  Thu Jun 23 17:01:18 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Thu, 23 Jun 2016 15:01:18 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
 <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>
Message-ID: <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>

But precedent is for separate WITH LEFT HALF BLACK and WITH RIGHT HALF
BLACK geometric shapes.

Also, I'm not sure if the BLACK HALF STAR and STAR WITH LEFT HALF BLACK are
entirely interchangeable. I usually see the former in situations using a
variable number of glyphs, where the number of glyphs shows the rating, as
in:

?
???
?????

while I see the latter in ratings with a fixed number of glyphs, where the
number of *filled* glyphs shows the rating, as in:

?????
?????
?????

It seems like either would work in the first case, but the LEFT HALF STAR
would be awkward in the second.

On Thu, Jun 23, 2016 at 2:46 PM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> You're right, mirroring for RTL, and vertical presentation may avoid
> creating 4 characters, only one would then be needed: HALF-BLACK WHITE STAR
> ...
>
> 2016-06-23 23:34 GMT+02:00 Garth Wallace <gwalla at gmail.com>:
>
>> On Thu, Jun 23, 2016 at 12:53 PM, Ken Shirriff <ken.shirriff at gmail.com>
>> wrote:
>>
>>> Half-stars are used all over the place for reviews and many people have
>>> expressed interest in a Unicode half star. I propose two new Unicode
>>> characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
>>> half star without and with an outline. What do you think? Is there any
>>> reason Unicode doesn't have a half star?
>>>
>>> Ken
>>>
>>
>> Ratings are usually sequences of stars, with any half star coming at the
>> end, like ???(half), AIUI, so it's usually the left side that's black.
>> But what about in right-to-left contexts? Would they be bidi-mirrored?
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/58c11c72/attachment.html>

From verdy_p at wanadoo.fr  Thu Jun 23 17:21:46 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 24 Jun 2016 00:21:46 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
 <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>
 <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
Message-ID: <CAGa7JC1BwvBxsLhHjzdJ=nrr0n0UQG6TM_ERG0+oFzVegdSQOA@mail.gmail.com>

There are also cases where these ratings are using a fixed number of stars,
but ALL of them are filled. Only the fiull color changes: the rating shows
for example the main rating stars in a plain contrasting blue, the other
stars are soft grey shades (less contrastng on the background. And in this
case, there's no WHITE STAR used !


2016-06-24 0:01 GMT+02:00 Garth Wallace <gwalla at gmail.com>:

> But precedent is for separate WITH LEFT HALF BLACK and WITH RIGHT HALF
> BLACK geometric shapes.
>
> Also, I'm not sure if the BLACK HALF STAR and STAR WITH LEFT HALF BLACK
> are entirely interchangeable. I usually see the former in situations using
> a variable number of glyphs, where the number of glyphs shows the rating,
> as in:
>
> ?
> ???
> ?????
>
> while I see the latter in ratings with a fixed number of glyphs, where the
> number of *filled* glyphs shows the rating, as in:
>
> ?????
> ?????
> ?????
>
> It seems like either would work in the first case, but the LEFT HALF STAR
> would be awkward in the second.
>
> On Thu, Jun 23, 2016 at 2:46 PM, Philippe Verdy <verdy_p at wanadoo.fr>
> wrote:
>
>> You're right, mirroring for RTL, and vertical presentation may avoid
>> creating 4 characters, only one would then be needed: HALF-BLACK WHITE STAR
>> ...
>>
>> 2016-06-23 23:34 GMT+02:00 Garth Wallace <gwalla at gmail.com>:
>>
>>> On Thu, Jun 23, 2016 at 12:53 PM, Ken Shirriff <ken.shirriff at gmail.com>
>>> wrote:
>>>
>>>> Half-stars are used all over the place for reviews and many people have
>>>> expressed interest in a Unicode half star. I propose two new Unicode
>>>> characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
>>>> half star without and with an outline. What do you think? Is there any
>>>> reason Unicode doesn't have a half star?
>>>>
>>>> Ken
>>>>
>>>
>>> Ratings are usually sequences of stars, with any half star coming at the
>>> end, like ???(half), AIUI, so it's usually the left side that's black.
>>> But what about in right-to-left contexts? Would they be bidi-mirrored?
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/31cda458/attachment.html>

From asmusf at ix.netcom.com  Thu Jun 23 17:27:04 2016
From: asmusf at ix.netcom.com (Asmus Freytag (c))
Date: Thu, 23 Jun 2016 15:27:04 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
 <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>
 <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
Message-ID: <05f51bb0-b140-863d-decd-55d02d975e42@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/667ed92e/attachment.html>

From unicode at acjs.net  Thu Jun 23 17:30:27 2016
From: unicode at acjs.net (ACJ Unicode)
Date: Fri, 24 Jun 2016 00:30:27 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
Message-ID: <203da7a5-653c-d098-75e0-001b7b589277@acjs.net>

Op 23-6-2016 om 21:53 schreef Ken Shirriff:

> Half-stars are used all over the place for reviews and many people 
> have expressed interest in a Unicode half star. I propose two new 
> Unicode characters: half a BLACK STAR (?) and a half-filled WHITE STAR 
> (?), i.e. a half star without and with an outline. What do you think? 
> Is there any reason Unicode doesn't have a half star?

+1

I was actually planning to write a proposal for this.


Alexander

From kenwhistler at att.net  Thu Jun 23 17:35:08 2016
From: kenwhistler at att.net (Ken Whistler)
Date: Thu, 23 Jun 2016 15:35:08 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
 <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>
 <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
Message-ID: <4987b2cc-3a4f-ca0b-c638-3f534bfbbfb2@att.net>


On 6/23/2016 3:01 PM, Garth Wallace wrote:
> But precedent is for separate WITH LEFT HALF BLACK and WITH RIGHT HALF 
> BLACK geometric shapes.
>
> Also, I'm not sure if the BLACK HALF STAR and STAR WITH LEFT HALF 
> BLACK are entirely interchangeable.

I agree. If we are going to do this, a set of 4 geometric symbols makes 
sense: the half black star, left and right, and the half and half 
black/white star, left and right.

We aren't likely to start down the road of making this kind of symbol 
tally display automatic bidi-mirroring. These aren't math operators -- 
but the half stars just as more symbols would be useful. And if somebody 
can turn up convincing evidence of use of star symbols cut in half on a 
horizontal axis, those would be interesting, too.

Oh, and please don't come around next asking for a green rotten splat 
and a red certified fresh tomato!!

--Ken


From leob at mailcom.com  Thu Jun 23 17:37:56 2016
From: leob at mailcom.com (Leo Broukhis)
Date: Thu, 23 Jun 2016 15:37:56 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
Message-ID: <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>

For a previous discussion on the topic, please see
the thread "Missing geometric shapes" around 11/12/12

Leo

On Thu, Jun 23, 2016 at 12:53 PM, Ken Shirriff <ken.shirriff at gmail.com>
wrote:

> Half-stars are used all over the place for reviews and many people have
> expressed interest in a Unicode half star. I propose two new Unicode
> characters: half a BLACK STAR (?) and a half-filled WHITE STAR (?), i.e. a
> half star without and with an outline. What do you think? Is there any
> reason Unicode doesn't have a half star?
>
> Ken
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160623/a7759346/attachment.html>

From textexin at xencraft.com  Thu Jun 23 18:06:04 2016
From: textexin at xencraft.com (Tex Texin)
Date: Thu, 23 Jun 2016 16:06:04 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <4987b2cc-3a4f-ca0b-c638-3f534bfbbfb2@att.net>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CA+p4_H10Luo9c2moeKNyF2ObZAvakmaU8f6GNBhUy8Jnd7i8Rw@mail.gmail.com>
 <CAGa7JC0U3yBvP5hO8ocEX14s+R8zZY5P6oEqC3VYZabefi5Zqg@mail.gmail.com>
 <CA+p4_H22MC8kvOfrB9e-XQ0K87ecz-2g2E+v3AE_=YaxBpLzeQ@mail.gmail.com>
 <4987b2cc-3a4f-ca0b-c638-3f534bfbbfb2@att.net>
Message-ID: <002101d1cda3$d1d2c6f0$757854d0$@xencraft.com>

I would have to check to see whether they are actually used, but I suspect using stars in RTL markets is not the best choice of symbols...

When you look at the number of symbols used for ratings or more general valuations, rather than adding horizontal and vertical shading for each of them, it might be better to have modifier characters for the four half-filled colorings. Or to generalize further, they can just be half modifiers, and the presentation can decide if the coloring is halved, or the image itself is only half-shown, or the other ways of indicating halved (eg half-eaten tomato...)

tex

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Ken Whistler
Sent: Thursday, June 23, 2016 3:35 PM
To: Garth Wallace
Cc: unicode at unicode.org
Subject: Re: Adding half-star to Unicode?


On 6/23/2016 3:01 PM, Garth Wallace wrote:
> But precedent is for separate WITH LEFT HALF BLACK and WITH RIGHT HALF 
> BLACK geometric shapes.
>
> Also, I'm not sure if the BLACK HALF STAR and STAR WITH LEFT HALF 
> BLACK are entirely interchangeable.

I agree. If we are going to do this, a set of 4 geometric symbols makes
sense: the half black star, left and right, and the half and half black/white star, left and right.

We aren't likely to start down the road of making this kind of symbol tally display automatic bidi-mirroring. These aren't math operators -- but the half stars just as more symbols would be useful. And if somebody can turn up convincing evidence of use of star symbols cut in half on a horizontal axis, those would be interesting, too.

Oh, and please don't come around next asking for a green rotten splat and a red certified fresh tomato!!

--Ken


From frederic.grosshans at gmail.com  Fri Jun 24 07:12:31 2016
From: frederic.grosshans at gmail.com (=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Grosshans?=)
Date: Fri, 24 Jun 2016 14:12:31 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>
Message-ID: <576D23AF.2050003@gmail.com>

Le 24/06/2016 00:37, Leo Broukhis a ?crit :
> For a previous discussion on the topic, please see
> the thread "Missing geometric shapes" around 11/12/12
The thread starts here : 
http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0008.html

It contains an example of half-filled star used in RTL (Hebrew) context, 
in an advertisement in Haaretz here 
http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0024.html


From jknappen at web.de  Fri Jun 24 09:04:18 2016
From: jknappen at web.de (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?=)
Date: Fri, 24 Jun 2016 16:04:18 +0200
Subject: Aw: Re: Adding half-star to Unicode?
In-Reply-To: <576D23AF.2050003@gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>,
 <576D23AF.2050003@gmail.com>
Message-ID: <trinity-42940c38-4a7e-4275-b34e-5ee400a78530-1466777058484@3capp-webde-bs26>

An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/2e368912/attachment.html>

From tfujiwar at redhat.com  Fri Jun 24 00:21:29 2016
From: tfujiwar at redhat.com (Takao Fujiwara)
Date: Fri, 24 Jun 2016 14:21:29 +0900
Subject: Emoji and Annotation data
Message-ID: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>

Hi,

I'm working on IBus - the input method framework for Linux.
I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters.
Since the file size is large and it's often updated, I'm thinking how to maintain the file.

I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the build at the moment.

I have questions:
  - if unicode.org provides the tarball of the stable html files or other data.
  - what is the license of the html files.

Do you have any ideas?

Thanks,
Fujiwara

From gwalla at gmail.com  Fri Jun 24 10:55:45 2016
From: gwalla at gmail.com (Garth Wallace)
Date: Fri, 24 Jun 2016 08:55:45 -0700
Subject: Adding half-star to Unicode?
In-Reply-To: <trinity-42940c38-4a7e-4275-b34e-5ee400a78530-1466777058484@3capp-webde-bs26>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>
 <576D23AF.2050003@gmail.com>
 <trinity-42940c38-4a7e-4275-b34e-5ee400a78530-1466777058484@3capp-webde-bs26>
Message-ID: <CA+p4_H2OjgYxpkpY0PPUjb+qX3wxLhWtwFYZv1Bfy-RYTXW0Ng@mail.gmail.com>

But would anarchists even want their symbol to be encoded?

On Fri, Jun 24, 2016 at 7:04 AM, "J?rg Knappen" <jknappen at web.de> wrote:

> Talking about fancy five stars, besides the vertically split ones there is
> the "Anarchist star" (a symbol for anarcho-syndicalism)
> with a diagonal split in a upper left red half and a lower left black
> half. Since there are political and ideological symbols encoded
> in UNicode, maybe this one is worth encoding as well (probably twice, once
> as a black and white plain symbol and once as a colourful Emoji).
>
> See here:
> https://commons.wikimedia.org/wiki/Category:Anarcho-Syndicalism#/media/File:Anarchist_star.svg
>
> FIVE PIONTED STAR WITH BLACK LOWER RIGHT HALF = anarchist star
> ANARCHIST STAR EMOJI
>
> --J?rg Knappen
>
> *Gesendet:* Freitag, 24. Juni 2016 um 14:12 Uhr
> *Von:* "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>
> *An:* unicode at unicode.org
> *Betreff:* Re: Adding half-star to Unicode?
> Le 24/06/2016 00:37, Leo Broukhis a ?crit :
> > For a previous discussion on the topic, please see
> > the thread "Missing geometric shapes" around 11/12/12
> The thread starts here :
> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0008.html
>
> It contains an example of half-filled star used in RTL (Hebrew) context,
> in an advertisement in Haaretz here
> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0024.html
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/485f5d9a/attachment.html>

From mark at macchiato.com  Fri Jun 24 11:04:40 2016
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Fri, 24 Jun 2016 18:04:40 +0200
Subject: Emoji and Annotation data
In-Reply-To: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
Message-ID: <CAJ2xs_E4_4xSvJOGXjPZzcE-JowC+Hwv5d2BH1+TY1dr5-fe=Q@mail.gmail.com>

You should never be scraping *any* Unicode HTML files. They are not made
for that, and there is no guarantee of stability.

The emoji files are built from data which is described in
http://www.unicode.org/reports/tr51/
(plus CLDR annotations and collation)

Mark

On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar at redhat.com> wrote:

> Hi,
>
> I'm working on IBus - the input method framework for Linux.
> I parse http://unicode.org/emoji/charts/emoji-list.html and create a
> dictionary between the annotations and the Emoji characters.
> Since the file size is large and it's often updated, I'm thinking how to
> maintain the file.
>
> I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for
> the build at the moment.
>
> I have questions:
>  - if unicode.org provides the tarball of the stable html files or other
> data.
>  - what is the license of the html files.
>
> Do you have any ideas?
>
> Thanks,
> Fujiwara
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/75d7a31d/attachment.html>

From verdy_p at wanadoo.fr  Fri Jun 24 12:10:06 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 24 Jun 2016 19:10:06 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CA+p4_H2OjgYxpkpY0PPUjb+qX3wxLhWtwFYZv1Bfy-RYTXW0Ng@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>
 <576D23AF.2050003@gmail.com>
 <trinity-42940c38-4a7e-4275-b34e-5ee400a78530-1466777058484@3capp-webde-bs26>
 <CA+p4_H2OjgYxpkpY0PPUjb+qX3wxLhWtwFYZv1Bfy-RYTXW0Ng@mail.gmail.com>
Message-ID: <CAGa7JC29YyADdFvVaup0R0f34a1Qn2qpiXDFZMijLmG3viOO4A@mail.gmail.com>

My bet is that they'll prefer using whatever code they want, hacking fonts
as necessary to overtake another political symbol when they'll want. They
could do that easily with Webfonts today (by designing a tiny webfont with
just one glyph mapped to any code point, including some ASCII symbol such
as the DOLLAR sign). They would even refuse any normalization and would not
even use the codepoint proposed for them, or by remapping some ASCII-art
string (the classic emoticons of Usenet; if we even attempt to define
standard colors, or glyph design, they'll invent another incompatible
design, will change colors, will rotate it, will change it into an
exploding star...). However the historic anarchists symbol that was seen on
walls and painted banners in Europe in the 19th and early 20th century was
only black.

And it was not really a star, but derived from the A letter in a circle,
with the horizontal bar frequently replaced by some fire arm, or slnated
and looking more like a thin arrow head slightly pointing upward (Various
decorations could be added on top: a striker throwing a mollotov... or
flowers; a plus sign; a "V" on top to mean "victory"). The strokes were
most often very irregular, as if they were brushed very rapidly on a wall.
More polished forms have been used where it is a standard A in an circle
open at the bottom and a small curved leg. Not all of them want flags with
colors. Other groups just use a red-filled standard 5-pointed star, over a
plain black  background.

In London still today, there's most often no star, just a red and black
flag (color cut on the diagonal). The red side or black side may be
attached on the hanging stem, but generally a black side is below the right
side. The red color varies also (green, dark purple, pink, orange,
white...) but the black color is seems to be always there (even if it's
just the classic circle A, that black may be used to fill the glyph, or the
background. There's no dedicated support, the symbols may be used
everywhere, integrated in all sort of graphics, made with various materials.

The flag may be raised in all positions. In Australia, this is a vertical
rainbow over a black area.

Other symbols of anarchism include a closed hand (fist) raised upward (in a
sign of protest) with a venom snake. The anarchist movements have always
been inventive and protecting against all sort of political regimes,
democartic or not, in fact they protest against all forms of state
government, and their official symbols.

2016-06-24 17:55 GMT+02:00 Garth Wallace <gwalla at gmail.com>:

> But would anarchists even want their symbol to be encoded?
>
> On Fri, Jun 24, 2016 at 7:04 AM, "J?rg Knappen" <jknappen at web.de> wrote:
>
>> Talking about fancy five stars, besides the vertically split ones there
>> is the "Anarchist star" (a symbol for anarcho-syndicalism)
>> with a diagonal split in a upper left red half and a lower left black
>> half. Since there are political and ideological symbols encoded
>> in UNicode, maybe this one is worth encoding as well (probably twice,
>> once as a black and white plain symbol and once as a colourful Emoji).
>>
>> See here:
>> https://commons.wikimedia.org/wiki/Category:Anarcho-Syndicalism#/media/File:Anarchist_star.svg
>>
>> FIVE PIONTED STAR WITH BLACK LOWER RIGHT HALF = anarchist star
>> ANARCHIST STAR EMOJI
>>
>> --J?rg Knappen
>>
>> *Gesendet:* Freitag, 24. Juni 2016 um 14:12 Uhr
>> *Von:* "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>
>> *An:* unicode at unicode.org
>> *Betreff:* Re: Adding half-star to Unicode?
>> Le 24/06/2016 00:37, Leo Broukhis a ?crit :
>> > For a previous discussion on the topic, please see
>> > the thread "Missing geometric shapes" around 11/12/12
>> The thread starts here :
>> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0008.html
>>
>> It contains an example of half-filled star used in RTL (Hebrew) context,
>> in an advertisement in Haaretz here
>> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0024.html
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/0228310b/attachment.html>

From leoboiko at namakajiri.net  Fri Jun 24 12:20:51 2016
From: leoboiko at namakajiri.net (Leonardo Boiko)
Date: Fri, 24 Jun 2016 14:20:51 -0300
Subject: Adding half-star to Unicode?
In-Reply-To: <CAGa7JC29YyADdFvVaup0R0f34a1Qn2qpiXDFZMijLmG3viOO4A@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>
 <576D23AF.2050003@gmail.com>
 <trinity-42940c38-4a7e-4275-b34e-5ee400a78530-1466777058484@3capp-webde-bs26>
 <CA+p4_H2OjgYxpkpY0PPUjb+qX3wxLhWtwFYZv1Bfy-RYTXW0Ng@mail.gmail.com>
 <CAGa7JC29YyADdFvVaup0R0f34a1Qn2qpiXDFZMijLmG3viOO4A@mail.gmail.com>
Message-ID: <CAJ6uix73ZbSMSxJ5RMhXtPV7C_zt-qprtNeO4N5+aSUhn1J0zQ@mail.gmail.com>

> My bet is that they'll prefer using whatever code they want, hacking
fonts as necessary to overtake another political symbol when they'll want.


They could liberate a code point from the private use area.


2016-06-24 14:10 GMT-03:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> My bet is that they'll prefer using whatever code they want, hacking fonts
> as necessary to overtake another political symbol when they'll want. They
> could do that easily with Webfonts today (by designing a tiny webfont with
> just one glyph mapped to any code point, including some ASCII symbol such
> as the DOLLAR sign). They would even refuse any normalization and would not
> even use the codepoint proposed for them, or by remapping some ASCII-art
> string (the classic emoticons of Usenet; if we even attempt to define
> standard colors, or glyph design, they'll invent another incompatible
> design, will change colors, will rotate it, will change it into an
> exploding star...). However the historic anarchists symbol that was seen on
> walls and painted banners in Europe in the 19th and early 20th century was
> only black.
>
> And it was not really a star, but derived from the A letter in a circle,
> with the horizontal bar frequently replaced by some fire arm, or slnated
> and looking more like a thin arrow head slightly pointing upward (Various
> decorations could be added on top: a striker throwing a mollotov... or
> flowers; a plus sign; a "V" on top to mean "victory"). The strokes were
> most often very irregular, as if they were brushed very rapidly on a wall.
> More polished forms have been used where it is a standard A in an circle
> open at the bottom and a small curved leg. Not all of them want flags with
> colors. Other groups just use a red-filled standard 5-pointed star, over a
> plain black  background.
>
> In London still today, there's most often no star, just a red and black
> flag (color cut on the diagonal). The red side or black side may be
> attached on the hanging stem, but generally a black side is below the right
> side. The red color varies also (green, dark purple, pink, orange,
> white...) but the black color is seems to be always there (even if it's
> just the classic circle A, that black may be used to fill the glyph, or the
> background. There's no dedicated support, the symbols may be used
> everywhere, integrated in all sort of graphics, made with various materials.
>
> The flag may be raised in all positions. In Australia, this is a vertical
> rainbow over a black area.
>
> Other symbols of anarchism include a closed hand (fist) raised upward (in
> a sign of protest) with a venom snake. The anarchist movements have always
> been inventive and protecting against all sort of political regimes,
> democartic or not, in fact they protest against all forms of state
> government, and their official symbols.
>
> 2016-06-24 17:55 GMT+02:00 Garth Wallace <gwalla at gmail.com>:
>
>> But would anarchists even want their symbol to be encoded?
>>
>> On Fri, Jun 24, 2016 at 7:04 AM, "J?rg Knappen" <jknappen at web.de> wrote:
>>
>>> Talking about fancy five stars, besides the vertically split ones there
>>> is the "Anarchist star" (a symbol for anarcho-syndicalism)
>>> with a diagonal split in a upper left red half and a lower left black
>>> half. Since there are political and ideological symbols encoded
>>> in UNicode, maybe this one is worth encoding as well (probably twice,
>>> once as a black and white plain symbol and once as a colourful Emoji).
>>>
>>> See here:
>>> https://commons.wikimedia.org/wiki/Category:Anarcho-Syndicalism#/media/File:Anarchist_star.svg
>>>
>>> FIVE PIONTED STAR WITH BLACK LOWER RIGHT HALF = anarchist star
>>> ANARCHIST STAR EMOJI
>>>
>>> --J?rg Knappen
>>>
>>> *Gesendet:* Freitag, 24. Juni 2016 um 14:12 Uhr
>>> *Von:* "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>
>>> *An:* unicode at unicode.org
>>> *Betreff:* Re: Adding half-star to Unicode?
>>> Le 24/06/2016 00:37, Leo Broukhis a ?crit :
>>> > For a previous discussion on the topic, please see
>>> > the thread "Missing geometric shapes" around 11/12/12
>>> The thread starts here :
>>> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0008.html
>>>
>>> It contains an example of half-filled star used in RTL (Hebrew) context,
>>> in an advertisement in Haaretz here
>>> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0024.html
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/495b1a8b/attachment.html>

From verdy_p at wanadoo.fr  Fri Jun 24 12:23:19 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Fri, 24 Jun 2016 19:23:19 +0200
Subject: Adding half-star to Unicode?
In-Reply-To: <CAJ6uix73ZbSMSxJ5RMhXtPV7C_zt-qprtNeO4N5+aSUhn1J0zQ@mail.gmail.com>
References: <CALBHtZw-3RB1iO_4QY2Yjo2RKecgDKoH3adUdnZJ=Xwgqwu-LQ@mail.gmail.com>
 <CAFmvRscWXhjXaim1Qas-gF9OvXA7dwc5s3k9BTt6ZgFM-RwL7w@mail.gmail.com>
 <576D23AF.2050003@gmail.com>
 <trinity-42940c38-4a7e-4275-b34e-5ee400a78530-1466777058484@3capp-webde-bs26>
 <CA+p4_H2OjgYxpkpY0PPUjb+qX3wxLhWtwFYZv1Bfy-RYTXW0Ng@mail.gmail.com>
 <CAGa7JC29YyADdFvVaup0R0f34a1Qn2qpiXDFZMijLmG3viOO4A@mail.gmail.com>
 <CAJ6uix73ZbSMSxJ5RMhXtPV7C_zt-qprtNeO4N5+aSUhn1J0zQ@mail.gmail.com>
Message-ID: <CAGa7JC1oUMzpWfQ35ifitKWkTC8ZN8Jb1zC+MvZCX7B-bVtvvQ@mail.gmail.com>

Or just reuse the code already assigned to the circled A (the most common
basic symbol), ignoring the many variants of shapes and colors.

2016-06-24 19:20 GMT+02:00 Leonardo Boiko <leoboiko at namakajiri.net>:

> > My bet is that they'll prefer using whatever code they want, hacking
> fonts as necessary to overtake another political symbol when they'll want.
>
>
> They could liberate a code point from the private use area.
>
>
> 2016-06-24 14:10 GMT-03:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>
>> My bet is that they'll prefer using whatever code they want, hacking
>> fonts as necessary to overtake another political symbol when they'll want.
>> They could do that easily with Webfonts today (by designing a tiny webfont
>> with just one glyph mapped to any code point, including some ASCII symbol
>> such as the DOLLAR sign). They would even refuse any normalization and
>> would not even use the codepoint proposed for them, or by remapping some
>> ASCII-art string (the classic emoticons of Usenet; if we even attempt to
>> define standard colors, or glyph design, they'll invent another
>> incompatible design, will change colors, will rotate it, will change it
>> into an exploding star...). However the historic anarchists symbol that was
>> seen on walls and painted banners in Europe in the 19th and early 20th
>> century was only black.
>>
>> And it was not really a star, but derived from the A letter in a circle,
>> with the horizontal bar frequently replaced by some fire arm, or slnated
>> and looking more like a thin arrow head slightly pointing upward (Various
>> decorations could be added on top: a striker throwing a mollotov... or
>> flowers; a plus sign; a "V" on top to mean "victory"). The strokes were
>> most often very irregular, as if they were brushed very rapidly on a wall.
>> More polished forms have been used where it is a standard A in an circle
>> open at the bottom and a small curved leg. Not all of them want flags with
>> colors. Other groups just use a red-filled standard 5-pointed star, over a
>> plain black  background.
>>
>> In London still today, there's most often no star, just a red and black
>> flag (color cut on the diagonal). The red side or black side may be
>> attached on the hanging stem, but generally a black side is below the right
>> side. The red color varies also (green, dark purple, pink, orange,
>> white...) but the black color is seems to be always there (even if it's
>> just the classic circle A, that black may be used to fill the glyph, or the
>> background. There's no dedicated support, the symbols may be used
>> everywhere, integrated in all sort of graphics, made with various materials.
>>
>> The flag may be raised in all positions. In Australia, this is a vertical
>> rainbow over a black area.
>>
>> Other symbols of anarchism include a closed hand (fist) raised upward (in
>> a sign of protest) with a venom snake. The anarchist movements have always
>> been inventive and protecting against all sort of political regimes,
>> democartic or not, in fact they protest against all forms of state
>> government, and their official symbols.
>>
>> 2016-06-24 17:55 GMT+02:00 Garth Wallace <gwalla at gmail.com>:
>>
>>> But would anarchists even want their symbol to be encoded?
>>>
>>> On Fri, Jun 24, 2016 at 7:04 AM, "J?rg Knappen" <jknappen at web.de> wrote:
>>>
>>>> Talking about fancy five stars, besides the vertically split ones there
>>>> is the "Anarchist star" (a symbol for anarcho-syndicalism)
>>>> with a diagonal split in a upper left red half and a lower left black
>>>> half. Since there are political and ideological symbols encoded
>>>> in UNicode, maybe this one is worth encoding as well (probably twice,
>>>> once as a black and white plain symbol and once as a colourful Emoji).
>>>>
>>>> See here:
>>>> https://commons.wikimedia.org/wiki/Category:Anarcho-Syndicalism#/media/File:Anarchist_star.svg
>>>>
>>>> FIVE PIONTED STAR WITH BLACK LOWER RIGHT HALF = anarchist star
>>>> ANARCHIST STAR EMOJI
>>>>
>>>> --J?rg Knappen
>>>>
>>>> *Gesendet:* Freitag, 24. Juni 2016 um 14:12 Uhr
>>>> *Von:* "Fr?d?ric Grosshans" <frederic.grosshans at gmail.com>
>>>> *An:* unicode at unicode.org
>>>> *Betreff:* Re: Adding half-star to Unicode?
>>>> Le 24/06/2016 00:37, Leo Broukhis a ?crit :
>>>> > For a previous discussion on the topic, please see
>>>> > the thread "Missing geometric shapes" around 11/12/12
>>>> The thread starts here :
>>>> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0008.html
>>>>
>>>> It contains an example of half-filled star used in RTL (Hebrew) context,
>>>> in an advertisement in Haaretz here
>>>> http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0024.html
>>>>
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160624/663332c9/attachment.html>

From costello at mitre.org  Sun Jun 26 03:37:32 2016
From: costello at mitre.org (Costello, Roger L.)
Date: Sun, 26 Jun 2016 08:37:32 +0000
Subject: Are there Unicode symbols for parenthesis generator symbols?
Message-ID: <BL2PR09MB10110F238378FB6D25710E51C8200@BL2PR09MB1011.namprd09.prod.outlook.com>

Hi Folks,

In the book Parsing Techniques the authors use a less than symbol with a dot tucked inside for the open parenthesis and a greater than symbol with a dot tucked insider for the close parenthesis. Also, they use an equal sign with a dot over it. You can see the 3 symbols here:

https://books.google.com/books?id=05xA_d5dSwAC&pg=PA267&lpg=PA267&dq=parenthesis+generator+symbols&source=bl&ots=3OwyeBndO8&sig=ZhwoeYRJjm3GTzNNP1vgsAVRisc&hl=en&sa=X&sqi=2&ved=0ahUKEwi577X-o8XNAhWBaz4KHc0QA_EQ6AEIIzAB#v=onepage&q=parenthesis%20generator%20symbols&f=false

Are there Unicode symbols for the 3 symbols?

/Roger


From ori at avtalion.name  Sun Jun 26 04:12:27 2016
From: ori at avtalion.name (Ori Avtalion)
Date: Sun, 26 Jun 2016 12:12:27 +0300
Subject: Emoji and Annotation data
In-Reply-To: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
Message-ID: <CALgdb5+_efZxnQksAjcJd9Gp-pKw+GiM6L2MHWT9gijOFN54bA@mail.gmail.com>

Hey,

I maintain an IBus module(?) that allows inputting emojis [1] (I think
I mentioned it before on IRC).
I use the data provided by EmojiOne, which also includes aliases and
the popular (but unofficial) "shortnames". You might find it useful
[2].

[1] https://github.com/salty-horse/ibus-uniemoji
[2] https://github.com/Ranks/emojione/emoji.json

On Fri, Jun 24, 2016 at 8:21 AM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
> Hi,
>
> I'm working on IBus - the input method framework for Linux.
> I parse http://unicode.org/emoji/charts/emoji-list.html and create a
> dictionary between the annotations and the Emoji characters.
> Since the file size is large and it's often updated, I'm thinking how to
> maintain the file.
>
> I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for
> the build at the moment.
>
> I have questions:
>  - if unicode.org provides the tarball of the stable html files or other
> data.
>  - what is the license of the html files.
>
> Do you have any ideas?
>
> Thanks,
> Fujiwara

From andrewcwest at gmail.com  Sun Jun 26 04:38:28 2016
From: andrewcwest at gmail.com (Andrew West)
Date: Sun, 26 Jun 2016 10:38:28 +0100
Subject: Are there Unicode symbols for parenthesis generator symbols?
In-Reply-To: <BL2PR09MB10110F238378FB6D25710E51C8200@BL2PR09MB1011.namprd09.prod.outlook.com>
References: <BL2PR09MB10110F238378FB6D25710E51C8200@BL2PR09MB1011.namprd09.prod.outlook.com>
Message-ID: <CALgEMhzRgDWajVugSOEhXP9=vnRt1Vwn5PbZ0crRoQg1tcphdA@mail.gmail.com>

On 26 June 2016 at 09:37, Costello, Roger L. <costello at mitre.org> wrote:
>
> In the book Parsing Techniques the authors use a less than symbol with a dot tucked inside for the open parenthesis and a greater than symbol with a dot tucked insider for the close parenthesis. Also, they use an equal sign with a dot over it. You can see the 3 symbols here:
>
> https://books.google.com/books?id=05xA_d5dSwAC&pg=PA267&lpg=PA267&dq=parenthesis+generator+symbols&source=bl&ots=3OwyeBndO8&sig=ZhwoeYRJjm3GTzNNP1vgsAVRisc&hl=en&sa=X&sqi=2&ved=0ahUKEwi577X-o8XNAhWBaz4KHc0QA_EQ6AEIIzAB#v=onepage&q=parenthesis%20generator%20symbols&f=false
>
> Are there Unicode symbols for the 3 symbols?

Yes, and they have all been around since Unicode 1.0:

U+22D6 ?
U+22D7 ?
U+2250 ? (named APPROACHES THE LIMIT)

Andrew


From verdy_p at wanadoo.fr  Sun Jun 26 08:00:56 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 26 Jun 2016 15:00:56 +0200
Subject: Are there Unicode symbols for parenthesis generator symbols?
In-Reply-To: <CALgEMhzRgDWajVugSOEhXP9=vnRt1Vwn5PbZ0crRoQg1tcphdA@mail.gmail.com>
References: <BL2PR09MB10110F238378FB6D25710E51C8200@BL2PR09MB1011.namprd09.prod.outlook.com>
 <CALgEMhzRgDWajVugSOEhXP9=vnRt1Vwn5PbZ0crRoQg1tcphdA@mail.gmail.com>
Message-ID: <CAGa7JC3HQd2-dWSQXLeYP-RsStfGa=yYORap9QTAt38Rk6ed4A@mail.gmail.com>

But there are also variants of U+2264 (?) and U+2265 (?) with dots within
the bracket (starting page 973 in the same book) for "weak precedence" of
operators...

These variants (used to compine ? or ? with ?) don't seem to be encoded.


2016-06-26 11:38 GMT+02:00 Andrew West <andrewcwest at gmail.com>:

> On 26 June 2016 at 09:37, Costello, Roger L. <costello at mitre.org> wrote:
> >
> > In the book Parsing Techniques the authors use a less than symbol with a
> dot tucked inside for the open parenthesis and a greater than symbol with a
> dot tucked insider for the close parenthesis. Also, they use an equal sign
> with a dot over it. You can see the 3 symbols here:
> >
> >
> https://books.google.com/books?id=05xA_d5dSwAC&pg=PA267&lpg=PA267&dq=parenthesis+generator+symbols&source=bl&ots=3OwyeBndO8&sig=ZhwoeYRJjm3GTzNNP1vgsAVRisc&hl=en&sa=X&sqi=2&ved=0ahUKEwi577X-o8XNAhWBaz4KHc0QA_EQ6AEIIzAB#v=onepage&q=parenthesis%20generator%20symbols&f=false
> >
> > Are there Unicode symbols for the 3 symbols?
>
> Yes, and they have all been around since Unicode 1.0:
>
> U+22D6 ?
> U+22D7 ?
> U+2250 ? (named APPROACHES THE LIMIT)
>
> Andrew
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160626/85543b1c/attachment.html>

From andrewcwest at gmail.com  Sun Jun 26 08:12:56 2016
From: andrewcwest at gmail.com (Andrew West)
Date: Sun, 26 Jun 2016 14:12:56 +0100
Subject: Are there Unicode symbols for parenthesis generator symbols?
In-Reply-To: <CAGa7JC3HQd2-dWSQXLeYP-RsStfGa=yYORap9QTAt38Rk6ed4A@mail.gmail.com>
References: <BL2PR09MB10110F238378FB6D25710E51C8200@BL2PR09MB1011.namprd09.prod.outlook.com>
 <CALgEMhzRgDWajVugSOEhXP9=vnRt1Vwn5PbZ0crRoQg1tcphdA@mail.gmail.com>
 <CAGa7JC3HQd2-dWSQXLeYP-RsStfGa=yYORap9QTAt38Rk6ed4A@mail.gmail.com>
Message-ID: <CALgEMhzXSiUsmosPE-A1cmCSbDLJNK7sPdz9xMSL4hacuNiwZQ@mail.gmail.com>

On 26 June 2016 at 14:00, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> But there are also variants of U+2264 (?) and U+2265 (?) with dots within
> the bracket (starting page 973 in the same book) for "weak precedence" of
> operators...

starting page 273

> These variants (used to compine ? or ? with ?) don't seem to be encoded.

No, but there are U+2A7F ? and U+2A80 ? with slanted equals which might suffice.

Andrew


From verdy_p at wanadoo.fr  Sun Jun 26 08:13:11 2016
From: verdy_p at wanadoo.fr (Philippe Verdy)
Date: Sun, 26 Jun 2016 15:13:11 +0200
Subject: Are there Unicode symbols for parenthesis generator symbols?
In-Reply-To: <CAGa7JC3HQd2-dWSQXLeYP-RsStfGa=yYORap9QTAt38Rk6ed4A@mail.gmail.com>
References: <BL2PR09MB10110F238378FB6D25710E51C8200@BL2PR09MB1011.namprd09.prod.outlook.com>
 <CALgEMhzRgDWajVugSOEhXP9=vnRt1Vwn5PbZ0crRoQg1tcphdA@mail.gmail.com>
 <CAGa7JC3HQd2-dWSQXLeYP-RsStfGa=yYORap9QTAt38Rk6ed4A@mail.gmail.com>
Message-ID: <CAGa7JC2BwCLb1Qs6-RiSdFz2Q+5POuJ_kqbPNu4SnNs6Cux-Pw@mail.gmail.com>

The encoded variants are U+2A7E (?) and U+2A7F (?) but with the lower bar
slanted rather than horizontal.
May be we could encode them with variant selectors (like for the two known
variants of ? and ?) ?


2016-06-26 15:00 GMT+02:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> But there are also variants of U+2264 (?) and U+2265 (?) with dots within
> the bracket (starting page 973 in the same book) for "weak precedence" of
> operators...
>
> These variants (used to compine ? or ? with ?) don't seem to be encoded.
>
>
> 2016-06-26 11:38 GMT+02:00 Andrew West <andrewcwest at gmail.com>:
>
>> On 26 June 2016 at 09:37, Costello, Roger L. <costello at mitre.org> wrote:
>> >
>> > In the book Parsing Techniques the authors use a less than symbol with
>> a dot tucked inside for the open parenthesis and a greater than symbol with
>> a dot tucked insider for the close parenthesis. Also, they use an equal
>> sign with a dot over it. You can see the 3 symbols here:
>> >
>> >
>> https://books.google.com/books?id=05xA_d5dSwAC&pg=PA267&lpg=PA267&dq=parenthesis+generator+symbols&source=bl&ots=3OwyeBndO8&sig=ZhwoeYRJjm3GTzNNP1vgsAVRisc&hl=en&sa=X&sqi=2&ved=0ahUKEwi577X-o8XNAhWBaz4KHc0QA_EQ6AEIIzAB#v=onepage&q=parenthesis%20generator%20symbols&f=false
>> >
>> > Are there Unicode symbols for the 3 symbols?
>>
>> Yes, and they have all been around since Unicode 1.0:
>>
>> U+22D6 ?
>> U+22D7 ?
>> U+2250 ? (named APPROACHES THE LIMIT)
>>
>> Andrew
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160626/7d35b8cc/attachment.html>

From tfujiwar at redhat.com  Sun Jun 26 23:09:47 2016
From: tfujiwar at redhat.com (Takao Fujiwara)
Date: Mon, 27 Jun 2016 13:09:47 +0900
Subject: Emoji and Annotation data
In-Reply-To: <CAJ2xs_E4_4xSvJOGXjPZzcE-JowC+Hwv5d2BH1+TY1dr5-fe=Q@mail.gmail.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
 <CAJ2xs_E4_4xSvJOGXjPZzcE-JowC+Hwv5d2BH1+TY1dr5-fe=Q@mail.gmail.com>
Message-ID: <2985c941-9c12-7b9f-7ec3-1fcf49ee5ea5@redhat.com>

On 06/25/16 01:04, Mark Davis ??-san wrote:
> You should never be scraping /any/ Unicode HTML files. They are not made for that, and there is no guarantee of stability.

I cannot find the license or descriptions about the HTML files.

>
> The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/
> (plus CLDR annotations and collation)

OK, I need the data which packages Emoji unicode and the annotation.
It would be great if the data could be provided besides the html files.

Thanks,
Fujiwara

>
> Mark
> //////
>
> On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar at redhat.com <mailto:tfujiwar at redhat.com>> wrote:
>
>     Hi,
>
>     I'm working on IBus - the input method framework for Linux.
>     I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters.
>     Since the file size is large and it's often updated, I'm thinking how to maintain the file.
>
>     I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the build at the moment.
>
>     I have questions:
>      - if unicode.org <http://unicode.org> provides the tarball of the stable html files or other data.
>      - what is the license of the html files.
>
>     Do you have any ideas?
>
>     Thanks,
>     Fujiwara
>
>


From tfujiwar at redhat.com  Sun Jun 26 23:13:55 2016
From: tfujiwar at redhat.com (Takao Fujiwara)
Date: Mon, 27 Jun 2016 13:13:55 +0900
Subject: Emoji and Annotation data
In-Reply-To: <CALgdb5+_efZxnQksAjcJd9Gp-pKw+GiM6L2MHWT9gijOFN54bA@mail.gmail.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
 <CALgdb5+_efZxnQksAjcJd9Gp-pKw+GiM6L2MHWT9gijOFN54bA@mail.gmail.com>
Message-ID: <2e621021-44bd-12f0-932a-a1d6b50c361b@redhat.com>

Thanks for that info and contribution.
Probably I will package the emojione for Fedora to use emoji.json.

Why you don't use only annotations? E.g. "us" hits too many Emoji.

Fujiwara

On 06/26/16 18:12, Ori Avtalion-san wrote:
> Hey,
>
> I maintain an IBus module(?) that allows inputting emojis [1] (I think
> I mentioned it before on IRC).
> I use the data provided by EmojiOne, which also includes aliases and
> the popular (but unofficial) "shortnames". You might find it useful
> [2].
>
> [1] https://github.com/salty-horse/ibus-uniemoji
> [2] https://github.com/Ranks/emojione/emoji.json
>
> On Fri, Jun 24, 2016 at 8:21 AM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>> Hi,
>>
>> I'm working on IBus - the input method framework for Linux.
>> I parse http://unicode.org/emoji/charts/emoji-list.html and create a
>> dictionary between the annotations and the Emoji characters.
>> Since the file size is large and it's often updated, I'm thinking how to
>> maintain the file.
>>
>> I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for
>> the build at the moment.
>>
>> I have questions:
>>  - if unicode.org provides the tarball of the stable html files or other
>> data.
>>  - what is the license of the html files.
>>
>> Do you have any ideas?
>>
>> Thanks,
>> Fujiwara
>


From tfujiwar at redhat.com  Mon Jun 27 00:34:59 2016
From: tfujiwar at redhat.com (Takao Fujiwara)
Date: Mon, 27 Jun 2016 14:34:59 +0900
Subject: Emoji and Annotation data
In-Reply-To: <A77D9984-0F99-45BD-BA53-A833B95E6B7F@me.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
 <CAJ2xs_E4_4xSvJOGXjPZzcE-JowC+Hwv5d2BH1+TY1dr5-fe=Q@mail.gmail.com>
 <2985c941-9c12-7b9f-7ec3-1fcf49ee5ea5@redhat.com>
 <A77D9984-0F99-45BD-BA53-A833B95E6B7F@me.com>
Message-ID: <6ad6f653-7d3d-412d-f914-51eade9ebe9a@redhat.com>

Hi,

E.g. http://unicode.org/emoji/charts/emoji-list.html
  "??" has the annotations of "face" and "grin".

The data is available in only the html files.

Fujiwara

On 06/27/16 14:16, Peter Edberg-san wrote:
> Fujiwara-san,
> If you follow the information indicated by UTR 51 (as Mark had suggested), you will see that:
>
> 1. The annotations data is available in CLDR here, in English:
> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
> (or in many other languages, such as Japanese:)
> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/ja.xml
>
> The description of the format for those xml files is here:
> http://www.unicode.org/reports/tr35/tr35-general.html#Annotations
>
> 2. Other emoji data files are here:
> http://www.unicode.org/Public/emoji/latest/
>
> These data files are what drive the generation of the charts.
>
> Best regards,
> Peter Edberg
>
>
>
>> On Jun 26, 2016, at 9:09 PM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>>
>> On 06/25/16 01:04, Mark Davis ??-san wrote:
>>> You should never be scraping /any/ Unicode HTML files. They are not made for that, and there is no guarantee of stability.
>>
>> I cannot find the license or descriptions about the HTML files.
>>
>>>
>>> The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/
>>> (plus CLDR annotations and collation)
>>
>> OK, I need the data which packages Emoji unicode and the annotation.
>> It would be great if the data could be provided besides the html files.
>>
>> Thanks,
>> Fujiwara
>>
>>>
>>> Mark
>>> //////
>>>
>>> On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar at redhat.com <mailto:tfujiwar at redhat.com>> wrote:
>>>
>>>    Hi,
>>>
>>>    I'm working on IBus - the input method framework for Linux.
>>>    I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters.
>>>    Since the file size is large and it's often updated, I'm thinking how to maintain the file.
>>>
>>>    I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the build at the moment.
>>>
>>>    I have questions:
>>>     - if unicode.org <http://unicode.org> provides the tarball of the stable html files or other data.
>>>     - what is the license of the html files.
>>>
>>>    Do you have any ideas?
>>>
>>>    Thanks,
>>>    Fujiwara
>>>
>>>
>>
>
>


From ori at avtalion.name  Mon Jun 27 01:58:45 2016
From: ori at avtalion.name (Ori Avtalion)
Date: Mon, 27 Jun 2016 09:58:45 +0300
Subject: Emoji and Annotation data
In-Reply-To: <2e621021-44bd-12f0-932a-a1d6b50c361b@redhat.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
 <CALgdb5+_efZxnQksAjcJd9Gp-pKw+GiM6L2MHWT9gijOFN54bA@mail.gmail.com>
 <2e621021-44bd-12f0-932a-a1d6b50c361b@redhat.com>
Message-ID: <CALgdb5KvFxSTQAqsSjw16=ibUd7531y7vCT0EErSbfkOSiiYwA@mail.gmail.com>

On Mon, Jun 27, 2016 at 7:13 AM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
> Why you don't use only annotations? E.g. "us" hits too many Emoji.

It's for all kinds of Unicode symbols, not just those that have emoji
representation.
Sometimes I find myself searching by the "real" Unicode name, and
sometimes by keyword, if I don't know what I'm looking for.

I keep tweaking it to provide better results, and I'm pretty pleased
with its current state.
It currently has a ranking algorithm based on what it matched on
(name, annotation/emojione keyword), and how successfully.

From tfujiwar at redhat.com  Mon Jun 27 02:48:20 2016
From: tfujiwar at redhat.com (Takao Fujiwara)
Date: Mon, 27 Jun 2016 16:48:20 +0900
Subject: Emoji and Annotation data
In-Reply-To: <017A6AB6-EB2B-4126-9866-C6FD13E17149@me.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
 <CAJ2xs_E4_4xSvJOGXjPZzcE-JowC+Hwv5d2BH1+TY1dr5-fe=Q@mail.gmail.com>
 <2985c941-9c12-7b9f-7ec3-1fcf49ee5ea5@redhat.com>
 <A77D9984-0F99-45BD-BA53-A833B95E6B7F@me.com>
 <6ad6f653-7d3d-412d-f914-51eade9ebe9a@redhat.com>
 <017A6AB6-EB2B-4126-9866-C6FD13E17149@me.com>
Message-ID: <8281cb72-c09b-1a8f-0257-4af462d1c9c7@redhat.com>

On 06/27/16 16:01, Peter Edberg-san wrote:
> I had suggested that you check
> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
> which has the line
> <annotation cp='[??]' tts='grinning face'>face; grin</annotation>
>
> Is that not what you want?

I'm sorry. I missed that.
OK, it seems emoji-list.html is the combination of en.xml and /Public/emoji/3.0/emoji-*.txt
However I cannot find some annotations. E.g. "america".

BTW, I think more categories are useful for the annotations likes "animal", "country".

Fujiwara

>
> - Peter
>
>
> On Jun 26, 2016, at 10:34 PM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>>
>> Hi,
>>
>> E.g. http://unicode.org/emoji/charts/emoji-list.html
>> "??" has the annotations of "face" and "grin".
>>
>> The data is available in only the html files.
>>
>> Fujiwara
>>
>> On 06/27/16 14:16, Peter Edberg-san wrote:
>>> Fujiwara-san,
>>> If you follow the information indicated by UTR 51 (as Mark had suggested), you will see that:
>>>
>>> 1. The annotations data is available in CLDR here, in English:
>>> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
>>> (or in many other languages, such as Japanese:)
>>> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/ja.xml
>>>
>>> The description of the format for those xml files is here:
>>> http://www.unicode.org/reports/tr35/tr35-general.html#Annotations
>>>
>>> 2. Other emoji data files are here:
>>> http://www.unicode.org/Public/emoji/latest/
>>>
>>> These data files are what drive the generation of the charts.
>>>
>>> Best regards,
>>> Peter Edberg
>>>
>>>
>>>
>>>> On Jun 26, 2016, at 9:09 PM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>>>>
>>>> On 06/25/16 01:04, Mark Davis ??-san wrote:
>>>>> You should never be scraping /any/ Unicode HTML files. They are not made for that, and there is no guarantee of stability.
>>>>
>>>> I cannot find the license or descriptions about the HTML files.
>>>>
>>>>>
>>>>> The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/
>>>>> (plus CLDR annotations and collation)
>>>>
>>>> OK, I need the data which packages Emoji unicode and the annotation.
>>>> It would be great if the data could be provided besides the html files.
>>>>
>>>> Thanks,
>>>> Fujiwara
>>>>
>>>>>
>>>>> Mark
>>>>> //////
>>>>>
>>>>> On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar at redhat.com <mailto:tfujiwar at redhat.com>> wrote:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>   I'm working on IBus - the input method framework for Linux.
>>>>>   I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters.
>>>>>   Since the file size is large and it's often updated, I'm thinking how to maintain the file.
>>>>>
>>>>>   I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the build at the moment.
>>>>>
>>>>>   I have questions:
>>>>>    - if unicode.org <http://unicode.org> provides the tarball of the stable html files or other data.
>>>>>    - what is the license of the html files.
>>>>>
>>>>>   Do you have any ideas?
>>>>>
>>>>>   Thanks,
>>>>>   Fujiwara
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>


From tfujiwar at redhat.com  Mon Jun 27 03:00:51 2016
From: tfujiwar at redhat.com (Takao Fujiwara)
Date: Mon, 27 Jun 2016 17:00:51 +0900
Subject: Emoji and Annotation data
In-Reply-To: <CALgdb5KvFxSTQAqsSjw16=ibUd7531y7vCT0EErSbfkOSiiYwA@mail.gmail.com>
References: <07d3a922-b9a3-e8cb-df09-746796c8e0d4@redhat.com>
 <CALgdb5+_efZxnQksAjcJd9Gp-pKw+GiM6L2MHWT9gijOFN54bA@mail.gmail.com>
 <2e621021-44bd-12f0-932a-a1d6b50c361b@redhat.com>
 <CALgdb5KvFxSTQAqsSjw16=ibUd7531y7vCT0EErSbfkOSiiYwA@mail.gmail.com>
Message-ID: <412dcbbd-f803-df98-7172-489bec525335@redhat.com>

On 06/27/16 15:58, Ori Avtalion-san wrote:
> On Mon, Jun 27, 2016 at 7:13 AM, Takao Fujiwara <tfujiwar at redhat.com> wrote:
>> Why you don't use only annotations? E.g. "us" hits too many Emoji.
>
> It's for all kinds of Unicode symbols, not just those that have emoji
> representation.
> Sometimes I find myself searching by the "real" Unicode name, and
> sometimes by keyword, if I don't know what I'm looking for.

It's a bit strange for me to type "us" and hits "bus" and "muscle".
The following the current implementation in IBus core:
https://github.com/ibus/ibus/commit/160d3c975a

Fujiwara

>
> I keep tweaking it to provide better results, and I'm pretty pleased
> with its current state.
> It currently has a ranking algorithm based on what it matched on
> (name, annotation/emojione keyword), and how successfully.
>


From drmccreedy at gmail.com  Tue Jun 28 00:09:38 2016
From: drmccreedy at gmail.com (drmccreedy .)
Date: Mon, 27 Jun 2016 23:09:38 -0600
Subject: USAT value in the kIRG_USource property
Message-ID: <CADPxN9oEZCqmXs3aW_Z8meWt_c2PrwX8utJcSL=pmrERnP8O7A@mail.gmail.com>

I see one codepoint now has the kIRG_USource property value of "USAT" in
the Unihan_IRGSources.txt file from Unihan.zip:
   U+20991 kIRG_USource USAT-00061

UAX #45 (U-source Ideographs, http://www.unicode.org/reports/tr45/index.html)
mentions UTC and UCI but not USAT.

UAX #38 (Unicode Han Database, http://www.unicode.org/reports/tr38/)
updated the syntax for the kIRG_USource property (but not the description)
to U(TC|CI|SAT)-[0-9]{5} so I'm pretty sure it's not a typo.

Where can I find a description of the USAT value?

Thanks,

David McCreedy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160627/7f731e8f/attachment.html>

From mpsuzuki at hiroshima-u.ac.jp  Tue Jun 28 01:17:45 2016
From: mpsuzuki at hiroshima-u.ac.jp (suzuki toshiya)
Date: Tue, 28 Jun 2016 15:17:45 +0900
Subject: [Unicode] USAT value in the kIRG_USource property
In-Reply-To: <6da30a5ddd554ad2b1709155972d8b0d@KL1PR04MB1637.apcprd04.prod.outlook.com>
References: <6da30a5ddd554ad2b1709155972d8b0d@KL1PR04MB1637.apcprd04.prod.outlook.com>
Message-ID: <57721689.3030305@hiroshima-u.ac.jp>

Dear David,

Although your confusion is unavoidable, USAT source
characters are submitted by Taisho Tripitaka digitization
project named "SAT" ( http://21dzk.l.u-tokyo.ac.jp/SAT/index_en.html ),
not by UTC. Therefore, current UAX#45 lacks the information.

# originally, IRG experts had once recommended to use "Z"
# prefix for SAT characters, but WG2 experts decided to
# use "U" prefix.

Anyway, UAX#38 is expected to be updated for syntax
and appropriate reference. I will ask SAT experts
comments.

I don't think there is requirement to update UAX#45
to include all USAT characters, but the title of
UAX "U-Source Ideographs" might be arguable.

Regards,
mpsuzuki

drmccreedy . wrote::
> I see one codepoint now has the kIRG_USource property value of "USAT" in 
> the Unihan_IRGSources.txt file from Unihan.zip:
>    U+20991 kIRG_USource USAT-00061
> 
> UAX #45 (U-source Ideographs, 
> http://www.unicode.org/reports/tr45/index.html) mentions UTC and UCI but 
> not USAT.
> 
> UAX #38 (Unicode Han Database, http://www.unicode.org/reports/tr38/) 
> updated the syntax for the kIRG_USource property (but not the 
> description) to U(TC|CI|SAT)-[0-9]{5} so I'm pretty sure it's not a typo.
> 
> Where can I find a description of the USAT value?
> 
> Thanks,
> 
> David McCreedy


From andrewcwest at gmail.com  Tue Jun 28 06:26:44 2016
From: andrewcwest at gmail.com (Andrew West)
Date: Tue, 28 Jun 2016 12:26:44 +0100
Subject: USAT value in the kIRG_USource property
In-Reply-To: <CADPxN9oEZCqmXs3aW_Z8meWt_c2PrwX8utJcSL=pmrERnP8O7A@mail.gmail.com>
References: <CADPxN9oEZCqmXs3aW_Z8meWt_c2PrwX8utJcSL=pmrERnP8O7A@mail.gmail.com>
Message-ID: <CALgEMhytvbES9dCYp90to-2zra72nUjc=wnvx6v4GZZwTLckyQ@mail.gmail.com>

David,

As Mr Suzuki says, despite the U prefix, USAT is not a Unicode source
character.  The reason why a solitary USAT source reference has
suddenly popped up in Ext. B is that several thousand ideographs were
proposed for encoding by SAT in what will be CJK Ext. F in Unicode
10.0 next year (there are currently 2,884 USAT characters in Ext. F).
At the WG2 meeting in Matsue Japan last year, in response to UK ballot
comments, USAT-00061 was unified with U+20991 in Ext. B (see
http://www.unicode.org/wg2/docs/n4701-M64-Recommendations.pdf
Recommendation M64.05c).  I suppose that the Unicode Standard will be
updated with a description of SAT when Ext. F is included in v. 10
next year.

Andrew


On 28 June 2016 at 06:09, drmccreedy . <drmccreedy at gmail.com> wrote:
> I see one codepoint now has the kIRG_USource property value of "USAT" in the
> Unihan_IRGSources.txt file from Unihan.zip:
>    U+20991 kIRG_USource USAT-00061
>
> UAX #45 (U-source Ideographs,
> http://www.unicode.org/reports/tr45/index.html) mentions UTC and UCI but not
> USAT.
>
> UAX #38 (Unicode Han Database, http://www.unicode.org/reports/tr38/) updated
> the syntax for the kIRG_USource property (but not the description) to
> U(TC|CI|SAT)-[0-9]{5} so I'm pretty sure it's not a typo.
>
> Where can I find a description of the USAT value?
>
> Thanks,
>
> David McCreedy

From drmccreedy at gmail.com  Tue Jun 28 21:41:51 2016
From: drmccreedy at gmail.com (drmccreedy .)
Date: Tue, 28 Jun 2016 20:41:51 -0600
Subject: USAT value in the kIRG_USource property
In-Reply-To: <CALgEMhytvbES9dCYp90to-2zra72nUjc=wnvx6v4GZZwTLckyQ@mail.gmail.com>
References: <CADPxN9oEZCqmXs3aW_Z8meWt_c2PrwX8utJcSL=pmrERnP8O7A@mail.gmail.com>
 <CALgEMhytvbES9dCYp90to-2zra72nUjc=wnvx6v4GZZwTLckyQ@mail.gmail.com>
Message-ID: <CADPxN9r3C2b0nxYmier8Orf+xwaocU1eG273jZ-KM-aMSCyz5Q@mail.gmail.com>

Thank you both for the background.

David McCreedy

On Tue, Jun 28, 2016 at 5:26 AM, Andrew West <andrewcwest at gmail.com> wrote:

> David,
>
> As Mr Suzuki says, despite the U prefix, USAT is not a Unicode source
> character.  The reason why a solitary USAT source reference has
> suddenly popped up in Ext. B is that several thousand ideographs were
> proposed for encoding by SAT in what will be CJK Ext. F in Unicode
> 10.0 next year (there are currently 2,884 USAT characters in Ext. F).
> At the WG2 meeting in Matsue Japan last year, in response to UK ballot
> comments, USAT-00061 was unified with U+20991 in Ext. B (see
> http://www.unicode.org/wg2/docs/n4701-M64-Recommendations.pdf
> Recommendation M64.05c).  I suppose that the Unicode Standard will be
> updated with a description of SAT when Ext. F is included in v. 10
> next year.
>
> Andrew
>
>
>
> On 28 June 2016 at 06:09, drmccreedy . <drmccreedy at gmail.com> wrote:
> > I see one codepoint now has the kIRG_USource property value of "USAT" in
> the
> > Unihan_IRGSources.txt file from Unihan.zip:
> >    U+20991 kIRG_USource USAT-00061
> >
> > UAX #45 (U-source Ideographs,
> > http://www.unicode.org/reports/tr45/index.html) mentions UTC and UCI
> but not
> > USAT.
> >
> > UAX #38 (Unicode Han Database, http://www.unicode.org/reports/tr38/)
> updated
> > the syntax for the kIRG_USource property (but not the description) to
> > U(TC|CI|SAT)-[0-9]{5} so I'm pretty sure it's not a typo.
> >
> > Where can I find a description of the USAT value?
> >
> > Thanks,
> >
> > David McCreedy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160628/c6ca6a4b/attachment.html>