From unicode at unicode.org  Tue Mar  3 15:53:20 2020
From: unicode at unicode.org (=?utf-8?Q?Daniel_B=C3=BCnzli?= via Unicode)
Date: Tue, 3 Mar 2020 22:53:20 +0100
Subject: UAX #14 for 13.0.0: LB27 first's line is obsolete
Message-ID: <etPan.5e5ed1d0.13da044c.7a8e@erratique.ch>

Hello,?

I think (more precisely my compiler thinks [1]) the first line of LB27 is already handled by the new LB22 rule and can be removed.?

Best,?

Daniel

[1]
File "uuseg_line_break.ml", line 206, characters 38-40:

206 | ? | (* LB27 *) ?_, (JL|JV|JT|H2|H3), (IN|PO) -> no_boundary s
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ^^
Warning 12: this sub-pattern is unused.


From unicode at unicode.org  Tue Mar  3 17:22:20 2020
From: unicode at unicode.org (Andy Heninger via Unicode)
Date: Tue, 3 Mar 2020 15:22:20 -0800
Subject: UAX #14 for 13.0.0: LB27 first's line is obsolete
In-Reply-To: <etPan.5e5ed1d0.13da044c.7a8e@erratique.ch>
References: <etPan.5e5ed1d0.13da044c.7a8e@erratique.ch>
Message-ID: <CAEtzAy5NOGq6SxmiF3KDpDXzzR3W58kMH+SywyJsPmeOAgCo+A@mail.gmail.com>

I agree. The LB27 first part rule

(JL | JV | JT | H2 | H3) ? IN

appears to be redundant.

Good catch.

  -- Andy

On Tue, Mar 3, 2020 at 1:53 PM Daniel B?nzli <daniel.buenzli at erratique.ch>
wrote:

> Hello,
>
> I think (more precisely my compiler thinks [1]) the first line of LB27 is
> already handled by the new LB22 rule and can be removed.
>
> Best,
>
> Daniel
>
> [1]
> File "uuseg_line_break.ml", line 206, characters 38-40:
>
> 206 |   | (* LB27 *)  _, (JL|JV|JT|H2|H3), (IN|PO) -> no_boundary s
>                                             ^^
> Warning 12: this sub-pattern is unused.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200303/d9992049/attachment.html>

From unicode at unicode.org  Wed Mar  4 11:01:25 2020
From: unicode at unicode.org (=?utf-8?Q?Daniel_B=C3=BCnzli?= via Unicode)
Date: Wed, 4 Mar 2020 18:01:25 +0100
Subject: UAX #29 and WB4
Message-ID: <etPan.5e5fdee5.513ca94f.7a8e@erratique.ch>

Hello,?

My implementation of word break chokes only on the following test case from the file [1]:?

? 0020 ? 0308 ? 0020 ??# ?? [0.2] SPACE (WSegSpace) ? [4.0] COMBINING DIAERESIS (Extend_FE) ? [999.0] SPACE (WSegSpace) ? [0.3]?

I find:?

? 0020 ? 0308 ??0020 ?

Basically my implementation uses WB4 to rewrite the first two characters to WSegSpace and then applies WB3ad resulting in the non-break between 0308 and 0020.

Re-reading the text I suspect I should not restart the rules from the first one when a WB4 rewrite occurs but only apply the subsequent rules. Is that correct ??

Best,?

Daniel

[1]:?https://unicode.org/Public/13.0.0/ucd/auxiliary/WordBreakTest.txt


From unicode at unicode.org  Wed Mar  4 11:48:09 2020
From: unicode at unicode.org (=?utf-8?Q?Daniel_B=C3=BCnzli?= via Unicode)
Date: Wed, 4 Mar 2020 18:48:09 +0100
Subject: UAX #29 and WB4
In-Reply-To: <etPan.5e5fdee5.513ca94f.7a8e@erratique.ch>
References: <etPan.5e5fdee5.513ca94f.7a8e@erratique.ch>
Message-ID: <etPan.5e5fe9d9.b1d996c.7a8e@erratique.ch>

On 4 March 2020 at 18:01:25, Daniel B?nzli (daniel.buenzli at erratique.ch) wrote:

> Re-reading the text I suspect I should not restart the rules from the first one when a WB4  
> rewrite occurs but only apply the subsequent rules. Is that correct ?

However even if that's correct I don't understand how this test case works:

? 1F6D1 ? 200D ? 1F6D1 ??# ?? [0.2] OCTAGONAL SIGN (ExtPict) ? [4.0] ZERO WIDTH JOINER (ZWJ_FE) ? [3.3] OCTAGONAL SIGN (ExtPict) ? [0.3]

Here the first two chars get rewritten with WB4 to ExtPic then if only subsequent rules are applied we end up in WB999 and a break between 200D and 1F6D1. The justification in the comment indicates to use WB3c on the ZWJ but that one should have been rewritten to ExtPict by WB4.?

Best,

Daniel


From unicode at unicode.org  Wed Mar  4 13:26:42 2020
From: unicode at unicode.org (=?utf-8?Q?Daniel_B=C3=BCnzli?= via Unicode)
Date: Wed, 4 Mar 2020 20:26:42 +0100
Subject: UAX #29 and WB4
In-Reply-To: <etPan.5e5fe9d9.b1d996c.7a8e@erratique.ch>
References: <etPan.5e5fdee5.513ca94f.7a8e@erratique.ch>
 <etPan.5e5fe9d9.b1d996c.7a8e@erratique.ch>
Message-ID: <etPan.5e6000f2.7566acdc.7a8e@erratique.ch>

On 4 March 2020 at 18:48:09, Daniel B?nzli (daniel.buenzli at erratique.ch) wrote:

> On 4 March 2020 at 18:01:25, Daniel B?nzli (daniel.buenzli at erratique.ch) wrote:
>  
> > Re-reading the text I suspect I should not restart the rules from the first one when a  
> WB4
> > rewrite occurs but only apply the subsequent rules. Is that correct ?
>  
> However even if that's correct I don't understand how this test case works:
>  
> ? 1F6D1 ? 200D ? 1F6D1 ? # ? [0.2] OCTAGONAL SIGN (ExtPict) ? [4.0] ZERO WIDTH JOINER (ZWJ_FE)  
> ? [3.3] OCTAGONAL SIGN (ExtPict) ? [0.3]
>  
> Here the first two chars get rewritten with WB4 to ExtPic then if only subsequent rules  
> are applied we end up in WB999 and a break between 200D and 1F6D1.?

That's nonsense and not the operational model of the algorithm which IIRC was once clearly stated on this list by Mark Davis (sorry I failed to dig out the message) which is to take each boundary position candidate and apply the rule in sequences taking the first one that matches and then start over with the next one.

In that case applying the rules bewteen 1F6D1 and 200D leads to WB4 but then that implicitely adds a non boundary condition -- this is not really evident from the formalism but see the comment above WB4, for that boundary position that settles the non boundary condition. Then we start again applying the rules between 200D and the last 1F6D1 and WB3c matches before WB4 quicks.?

I think the behaviour of ? rules should be clarified: it's not clear on which data you apply it w.r.t. the boundary position candiate. If I understand correctly if the match spans over the boundary position candidate that simply turns it into a non-boundary. Otherwise you apply the rule on the left of the boundary position candiate.?

Regarding the question of my original message it seems at a certain point I knew better:?

??https://www.unicode.org/mail-arch/unicode-ml/y2016-m11/0151.html

Sorry for the noise.?

Daniel

P.S. I still think the UAX29 and UAX14 could benefit from clarifiying the operational model of the rules a bit (I also have the impression that the formalism to express all that may not be the right one, but then I don't have something better to propose at the time). Also it would be nicer for implementers if they didn't have to factorize rules themselves (e.g. like in the new LB30 rules of UAX14) so that correctness of implemented rules is easier to assert.?


From unicode at unicode.org  Wed Mar  4 17:58:57 2020
From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode)
Date: Wed, 4 Mar 2020 15:58:57 -0800
Subject: UAX #29 and WB4
In-Reply-To: <etPan.5e6000f2.7566acdc.7a8e@erratique.ch>
References: <etPan.5e5fdee5.513ca94f.7a8e@erratique.ch>
 <etPan.5e5fe9d9.b1d996c.7a8e@erratique.ch>
 <etPan.5e6000f2.7566acdc.7a8e@erratique.ch>
Message-ID: <CAJ2xs_GCFiSmCWPEeQ0vb-+5HyfWw6GnOHa8Me_65PQ=Lyu+SQ@mail.gmail.com>

One thing we have considered for a while is whether to do a rewrite of the
rules to simplify the processing (and avoid the "treat as" rules), but it
would take a fair amount of design work that we haven't had time to do. If
you (or others) are interested in getting involved, please let us know.

Mark


On Wed, Mar 4, 2020 at 11:30 AM Daniel B?nzli via Unicode <
unicode at unicode.org> wrote:

> On 4 March 2020 at 18:48:09, Daniel B?nzli (daniel.buenzli at erratique.ch)
> wrote:
>
> > On 4 March 2020 at 18:01:25, Daniel B?nzli (daniel.buenzli at erratique.ch)
> wrote:
> >
> > > Re-reading the text I suspect I should not restart the rules from the
> first one when a
> > WB4
> > > rewrite occurs but only apply the subsequent rules. Is that correct ?
> >
> > However even if that's correct I don't understand how this test case
> works:
> >
> > ? 1F6D1 ? 200D ? 1F6D1 ? # ? [0.2] OCTAGONAL SIGN (ExtPict) ? [4.0] ZERO
> WIDTH JOINER (ZWJ_FE)
> > ? [3.3] OCTAGONAL SIGN (ExtPict) ? [0.3]
> >
> > Here the first two chars get rewritten with WB4 to ExtPic then if only
> subsequent rules
> > are applied we end up in WB999 and a break between 200D and 1F6D1.
>
> That's nonsense and not the operational model of the algorithm which IIRC
> was once clearly stated on this list by Mark Davis (sorry I failed to dig
> out the message) which is to take each boundary position candidate and
> apply the rule in sequences taking the first one that matches and then
> start over with the next one.
>
> In that case applying the rules bewteen 1F6D1 and 200D leads to WB4 but
> then that implicitely adds a non boundary condition -- this is not really
> evident from the formalism but see the comment above WB4, for that boundary
> position that settles the non boundary condition. Then we start again
> applying the rules between 200D and the last 1F6D1 and WB3c matches before
> WB4 quicks.
>
> I think the behaviour of ? rules should be clarified: it's not clear on
> which data you apply it w.r.t. the boundary position candiate. If I
> understand correctly if the match spans over the boundary position
> candidate that simply turns it into a non-boundary. Otherwise you apply the
> rule on the left of the boundary position candiate.
>
> Regarding the question of my original message it seems at a certain point
> I knew better:
>
>   https://www.unicode.org/mail-arch/unicode-ml/y2016-m11/0151.html
>
> Sorry for the noise.
>
> Daniel
>
> P.S. I still think the UAX29 and UAX14 could benefit from clarifiying the
> operational model of the rules a bit (I also have the impression that the
> formalism to express all that may not be the right one, but then I don't
> have something better to propose at the time). Also it would be nicer for
> implementers if they didn't have to factorize rules themselves (e.g. like
> in the new LB30 rules of UAX14) so that correctness of implemented rules is
> easier to assert.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200304/8241fd6c/attachment.html>

From unicode at unicode.org  Fri Mar  6 21:36:31 2020
From: unicode at unicode.org (Zack Newman via Unicode)
Date: Fri, 6 Mar 2020 20:36:31 -0700
Subject: UAX #29 6.2
Message-ID: <CADdMYoH4dhm8pRZASZK6Or3hjRwJAB2bosnGD0yYu9JvS=7c9g@mail.gmail.com>

According to 6.2, "thus ignoring Extend is sufficient to disallow breaking
within a grapheme cluster." However the sequence of Unicode scalar values
(U+0600, U+0020) is considered a single grapheme cluster due to rule GB9,
but the sequence is parsed into two words according to 4.1.1. While it
would be ideal to not have sequences of Unicode scalar values that can be
parsed into more words than grapheme clusters, I think it's more
understandable if section 6.2 didn't explicitly state that this isn't
possible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200306/446f3dc7/attachment.html>

From unicode at unicode.org  Sat Mar  7 13:36:56 2020
From: unicode at unicode.org (Rick McGowan via Unicode)
Date: Sat, 07 Mar 2020 11:36:56 -0800
Subject: Reminder about reporting bugs, errors, and other feedback
Message-ID: <5E63F7D8.4050309@unicode.org>

Hello everyone...

This is just a little public service reminder that discussions on the 
Unicode mail list are not considered official feedback, and are not 
reviewed by UTC members or staff as a source for bug reports.

If you want to make sure your feedback and/or report gets into the UTC 
process, it is best to submit it through our reporting form, which can 
be found here:

https://www.unicode.org/reporting.html

Cheers,


From unicode at unicode.org  Tue Mar 10 00:00:57 2020
From: unicode at unicode.org (Andy Heninger via Unicode)
Date: Mon, 9 Mar 2020 22:00:57 -0700
Subject: UAX #29 and WB4
In-Reply-To: <CAJ2xs_GCFiSmCWPEeQ0vb-+5HyfWw6GnOHa8Me_65PQ=Lyu+SQ@mail.gmail.com>
References: <etPan.5e5fdee5.513ca94f.7a8e@erratique.ch>
 <etPan.5e5fe9d9.b1d996c.7a8e@erratique.ch>
 <etPan.5e6000f2.7566acdc.7a8e@erratique.ch>
 <CAJ2xs_GCFiSmCWPEeQ0vb-+5HyfWw6GnOHa8Me_65PQ=Lyu+SQ@mail.gmail.com>
Message-ID: <CAEtzAy4Xh+ngnjBoQgaoV3AktiD-6CzNdC_LfsuBPw1H5s6YjA@mail.gmail.com>

 daniel.buenzli wrote:

I think the behaviour of ? rules should be clarified


I wholeheartedly agree.

If I understand correctly if the match [or a "treat-as" rule] spans over
> the [candidate] boundary position candidate that simply turns it into a
> non-boundary. Otherwise you apply the rule on the left of the boundary
> position candiate.


I have considered the extent of a left-side treat-as match to not continue
beyond the candidate boundary position. This comes into play following a
ZWJ, where it may be absorbed into a "treat as" on the left (WB4), while
some other rule triggers on the right side (WB3C). At any rate, this is
what I do in ICU. It gets very confusing, and is tricky to implement.

Reconsidering how ZWJ rules work could also be a help, if we could figure
out how to keep them out of the "treat as" rules, but use explicit no-break
rules on both sides instead.

  -- Andy

On Wed, Mar 4, 2020 at 4:01 PM Mark Davis ?? via Unicode <
unicode at unicode.org> wrote:

> One thing we have considered for a while is whether to do a rewrite of the
> rules to simplify the processing (and avoid the "treat as" rules), but it
> would take a fair amount of design work that we haven't had time to do. If
> you (or others) are interested in getting involved, please let us know.
>
> Mark
>
>
> On Wed, Mar 4, 2020 at 11:30 AM Daniel B?nzli via Unicode <
> unicode at unicode.org> wrote:
>
>> On 4 March 2020 at 18:48:09, Daniel B?nzli (daniel.buenzli at erratique.ch)
>> wrote:
>>
>> > On 4 March 2020 at 18:01:25, Daniel B?nzli (daniel.buenzli at erratique.ch)
>> wrote:
>> >
>> > > Re-reading the text I suspect I should not restart the rules from the
>> first one when a
>> > WB4
>> > > rewrite occurs but only apply the subsequent rules. Is that correct ?
>> >
>> > However even if that's correct I don't understand how this test case
>> works:
>> >
>> > ? 1F6D1 ? 200D ? 1F6D1 ? # ? [0.2] OCTAGONAL SIGN (ExtPict) ? [4.0]
>> ZERO WIDTH JOINER (ZWJ_FE)
>> > ? [3.3] OCTAGONAL SIGN (ExtPict) ? [0.3]
>> >
>> > Here the first two chars get rewritten with WB4 to ExtPic then if only
>> subsequent rules
>> > are applied we end up in WB999 and a break between 200D and 1F6D1.
>>
>> That's nonsense and not the operational model of the algorithm which IIRC
>> was once clearly stated on this list by Mark Davis (sorry I failed to dig
>> out the message) which is to take each boundary position candidate and
>> apply the rule in sequences taking the first one that matches and then
>> start over with the next one.
>>
>> In that case applying the rules bewteen 1F6D1 and 200D leads to WB4 but
>> then that implicitely adds a non boundary condition -- this is not really
>> evident from the formalism but see the comment above WB4, for that boundary
>> position that settles the non boundary condition. Then we start again
>> applying the rules between 200D and the last 1F6D1 and WB3c matches before
>> WB4 quicks.
>>
>> I think the behaviour of ? rules should be clarified: it's not clear on
>> which data you apply it w.r.t. the boundary position candiate. If I
>> understand correctly if the match spans over the boundary position
>> candidate that simply turns it into a non-boundary. Otherwise you apply the
>> rule on the left of the boundary position candiate.
>>
>> Regarding the question of my original message it seems at a certain point
>> I knew better:
>>
>>   https://www.unicode.org/mail-arch/unicode-ml/y2016-m11/0151.html
>>
>> Sorry for the noise.
>>
>> Daniel
>>
>> P.S. I still think the UAX29 and UAX14 could benefit from clarifiying the
>> operational model of the rules a bit (I also have the impression that the
>> formalism to express all that may not be the right one, but then I don't
>> have something better to propose at the time). Also it would be nicer for
>> implementers if they didn't have to factorize rules themselves (e.g. like
>> in the new LB30 rules of UAX14) so that correctness of implemented rules is
>> easier to assert.
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200309/25935c2a/attachment.html>

From unicode at unicode.org  Wed Mar 11 12:29:06 2020
From: unicode at unicode.org (Karl Williamson via Unicode)
Date: Wed, 11 Mar 2020 11:29:06 -0600
Subject: EGYPTIAN HIEROGLYPH MAN WITH A ROLL OF TOILET PAPER
In-Reply-To: <882d3d3d-c164-3681-2e1b-76289bc0500e@gmail.com>
References: <882d3d3d-c164-3681-2e1b-76289bc0500e@gmail.com>
Message-ID: <ee02fbd3-a8ca-0afd-aa85-e47addf3648a@khwilliamson.com>

On 2/12/20 11:12 AM, Fr?d?ric Grosshans via Unicode wrote:
> Dear Unicode list members (CC Michel Suignard),
> 
>  ? the Unicode proposal L2/20-068 
> <https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf>, 
> ?Revised draft for the encoding of an extended Egyptian Hieroglyphs 
> repertoire, Groups A to N? ( 
> https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf ) by 
> Michel Suignard contains a very interesting hieroglyph at position 
> *U+13579 EGYPTIAN HIEROGLYPH A-12-054, which seems to represent a man 
> with a laptop, as can be obvious in the attached image.
> 

Someone suggested today that this would be the more up-to-date character


From unicode at unicode.org  Fri Mar 20 07:21:26 2020
From: unicode at unicode.org (Costello, Roger L. via Unicode)
Date: Fri, 20 Mar 2020 12:21:26 +0000
Subject: Is the binaryness/textness of a data format a property?
Message-ID: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>

Hello Data Format Experts!

[Definition] Property: an attribute, quality, or characteristic of something.

JPEG is a binary data format.
CSV is a text data format.

Question #1: Is the binaryness/textness of a data format a property? 

Question #2: If the answer to Question #1 is yes, then what is the name of this binaryness/textness property?

Question #3: Here is another way of asking Question #2: Please fill in the following blanks with the property name (both blanks should be filled with the same thing):

For the JPEG data format:  _____ = binary.
For the CSV data format:  _____ = text. 

/Roger


From unicode at unicode.org  Fri Mar 20 07:36:34 2020
From: unicode at unicode.org (Dreiheller, Albrecht via Unicode)
Date: Fri, 20 Mar 2020 12:36:34 +0000
Subject: AW: Is the binaryness/textness of a data format a property?
In-Reply-To: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
References: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
Message-ID: <AM0PR10MB2945F2B27A22F59BB5AE0DA8F6F50@AM0PR10MB2945.EURPRD10.PROD.OUTLOOK.COM>

#1: Yes.
#2: [ my suggestion ]  File type category

A.D.

-----Urspr?ngliche Nachricht-----
Von: Unicode <unicode-bounces at unicode.org> Im Auftrag von Costello, Roger L. via Unicode
Gesendet: Freitag, 20. M?rz 2020 13:21
An: unicode at unicode.org
Betreff: Is the binaryness/textness of a data format a property?

Hello Data Format Experts!

[Definition] Property: an attribute, quality, or characteristic of something.

JPEG is a binary data format.
CSV is a text data format.

Question #1: Is the binaryness/textness of a data format a property? 

Question #2: If the answer to Question #1 is yes, then what is the name of this binaryness/textness property?

Question #3: Here is another way of asking Question #2: Please fill in the following blanks with the property name (both blanks should be filled with the same thing):

For the JPEG data format:  _____ = binary.
For the CSV data format:  _____ = text. 

/Roger


From unicode at unicode.org  Fri Mar 20 07:46:25 2020
From: unicode at unicode.org (Adam Borowski via Unicode)
Date: Fri, 20 Mar 2020 13:46:25 +0100
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
References: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
Message-ID: <20200320124625.GC32403@angband.pl>

On Fri, Mar 20, 2020 at 12:21:26PM +0000, Costello, Roger L. via Unicode wrote:
> [Definition] Property: an attribute, quality, or characteristic of something.
> 
> JPEG is a binary data format.
> CSV is a text data format.
> 
> Question #1: Is the binaryness/textness of a data format a property? 
> 
> Question #2: If the answer to Question #1 is yes, then what is the name of
> this binaryness/textness property?

I'm afraid this question is too fuzzy to have a proper answer.

For example, most Unix-heads will tell you that UTF16LE is a binary rather
than text format.  Microsoft employees and some members of this list will
disagree.

Then you have Postscript -- nothing but basic ASCII, yet utterly unreadable
for a (sane) human.

If you want _my_ definition of a file being _technically_ text, it's:
* no bytes 0..31 other than newlines and tabs (even form feeds are out
  nowadays)
* correctly encoded for the expected charset (and nowadays, if that's not
  UTF-8 Unicode, you're doing it wrong)
* no invalid characters

But besides this narrow technical meaning -- is a Word document "text"?
And if it is, why not Powerpoint?  This all falls apart.


Meow!
-- 
???????
??????? in the beginning was the boot and root floppies and they were good.
???????                                       -- <willmore> on #linux-sunxi
???????

From unicode at unicode.org  Fri Mar 20 09:22:45 2020
From: unicode at unicode.org (J Decker via Unicode)
Date: Fri, 20 Mar 2020 07:22:45 -0700
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <20200320124625.GC32403@angband.pl>
References: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
 <20200320124625.GC32403@angband.pl>
Message-ID: <CAA2GJqWh3ozpCJ7cngBvPKNEsrismWF31QJM-sdDZrBk69cJiw@mail.gmail.com>

On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode <
unicode at unicode.org> wrote:

> On Fri, Mar 20, 2020 at 12:21:26PM +0000, Costello, Roger L. via Unicode
> wrote:
> > [Definition] Property: an attribute, quality, or characteristic of
> something.
> >
> > JPEG is a binary data format.
> > CSV is a text data format.
> >
> > Question #1: Is the binaryness/textness of a data format a property?
> >
> > Question #2: If the answer to Question #1 is yes, then what is the name
> of
> > this binaryness/textness property?
>
> I'm afraid this question is too fuzzy to have a proper answer.
>
> For example, most Unix-heads will tell you that UTF16LE is a binary rather
> than text format.  Microsoft employees and some members of this list will
> disagree.
>
> Then you have Postscript -- nothing but basic ASCII, yet utterly unreadable
> for a (sane) human.
>
> If you want _my_ definition of a file being _technically_ text, it's:
> * no bytes 0..31 other than newlines and tabs (even form feeds are out
>   nowadays)
> * correctly encoded for the expected charset (and nowadays, if that's not
>   UTF-8 Unicode, you're doing it wrong)
> * no invalid characters
>

Just a minor note...
In the case of UTF8, this means no bytes 0xF8-0xFF will ever be used; every
valid utf8 codeunit has at least 1 bit off.
I wouldn't be so picky about 'no bytes 0-31' because \t, \n, \x1b(ANSI
codes) are all quite usable...


>
> But besides this narrow technical meaning -- is a Word document "text"?
> And if it is, why not Powerpoint?  This all falls apart.
>
>
> Meow!
> --
> ???????
> ??????? in the beginning was the boot and root floppies and they were good.
> ???????                                       -- <willmore> on #linux-sunxi
> ???????
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200320/83ad4072/attachment.html>

From unicode at unicode.org  Fri Mar 20 09:41:23 2020
From: unicode at unicode.org (Adam Borowski via Unicode)
Date: Fri, 20 Mar 2020 15:41:23 +0100
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <CAA2GJqWh3ozpCJ7cngBvPKNEsrismWF31QJM-sdDZrBk69cJiw@mail.gmail.com>
References: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
 <20200320124625.GC32403@angband.pl>
 <CAA2GJqWh3ozpCJ7cngBvPKNEsrismWF31QJM-sdDZrBk69cJiw@mail.gmail.com>
Message-ID: <20200320144123.GA6554@angband.pl>

On Fri, Mar 20, 2020 at 07:22:45AM -0700, J Decker via Unicode wrote:
> On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode <
> > For example, most Unix-heads will tell you that UTF16LE is a binary rather
> > than text format.  Microsoft employees and some members of this list will
> > disagree.
[...]
> > If you want _my_ definition of a file being _technically_ text, it's:
> > * no bytes 0..31 other than newlines and tabs (even form feeds are out
> >   nowadays)
> > * correctly encoded for the expected charset (and nowadays, if that's not
> >   UTF-8 Unicode, you're doing it wrong)
> > * no invalid characters
> 
> Just a minor note...
> In the case of UTF8, this means no bytes 0xF8-0xFF will ever be used; every
> valid utf8 codeunit has at least 1 bit off.

Yeah, but I allowed for ancient encodings, some of which do use these bytes.
(I do discriminate against UTF16 and shift-state ones, they're too broken.)

Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF or
U+11000..U+7FFFFFFF (or possibly even up to 2?? or 2??), which has its uses
but is not well-formed Unicode.

> I wouldn't be so picky about 'no bytes 0-31' because \t, \n, \x1b(ANSI
> codes) are all quite usable...

\t is tab, \n a newline (blah blah blah \r).

As for \e (\x1b), that's higher-level markup.  I do use it -- hey, you can
"apt/dnf install colorized-logs" for my tools -- but that's beyond plain
text.


?!
-- 
???????
??????? in the beginning was the boot and root floppies and they were good.
???????                                       -- <willmore> on #linux-sunxi
???????

From unicode at unicode.org  Fri Mar 20 09:49:24 2020
From: unicode at unicode.org (Richard Wordingham via Unicode)
Date: Fri, 20 Mar 2020 14:49:24 +0000
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <20200320124625.GC32403@angband.pl>
References: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
 <20200320124625.GC32403@angband.pl>
Message-ID: <20200320144924.6dfab15a@JRWUBU2>

On Fri, 20 Mar 2020 13:46:25 +0100
Adam Borowski via Unicode <unicode at unicode.org> wrote:

> On Fri, Mar 20, 2020 at 12:21:26PM +0000, Costello, Roger L. via
> Unicode wrote:
> > [Definition] Property: an attribute, quality, or characteristic of
> > something.
> > 
> > JPEG is a binary data format.
> > CSV is a text data format.
> > 
> > Question #1: Is the binaryness/textness of a data format a
> > property? 
> > 
> > Question #2: If the answer to Question #1 is yes, then what is the
> > name of this binaryness/textness property?  

I'd suggest 'texthood' as the correct English term.

> I'm afraid this question is too fuzzy to have a proper answer.
> 
> For example, most Unix-heads will tell you that UTF16LE is a binary
> rather than text format.  Microsoft employees and some members of
> this list will disagree.

Some files change type on changing operating system.  Digital's old RMS
formats included as basic text files in which each record (roughly a
line) started with a binary 2-byte length field.  Text records on
magnetic tape typically started with an ASCII length count!

> Then you have Postscript -- nothing but basic ASCII, yet utterly
> unreadable for a (sane) human.

No worse than a hex dump - in fact, a lot more readable.  Indeed, are
you not aware of the concept of a write-only programming language? 

> If you want _my_ definition of a file being _technically_ text, it's:
> * no bytes 0..31 other than newlines and tabs (even form feeds are out
>   nowadays)
> * correctly encoded for the expected charset (and nowadays, if that's
> not UTF-8 Unicode, you're doing it wrong)
> * no invalid characters

Unassigned characters are perfectly reasonable in a text file.  Surely
you aren't saying that a text file using the characters new to Unicode
13.0 should, at present, usually be regarded as a binary file?

> But besides this narrow technical meaning -- is a Word document
> "text"? And if it is, why not Powerpoint?  This all falls apart.

Well, a .docx file isn't text - it's a variety of ZIP file, which is
binary.  Indeed, as word files naturally include pictures, it very much
isn't a text file.  A .doc file is more like an image dump of a file
system.  A .rtf file on the other hand, probably is a text file -
though I've a feeling there are variants that aren't *A*SCII.

Richard.

From unicode at unicode.org  Fri Mar 20 20:43:50 2020
From: unicode at unicode.org (=?utf-8?B?TWFydGluIEouIETDvHJzdA==?= via Unicode)
Date: Sat, 21 Mar 2020 01:43:50 +0000
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <20200320144123.GA6554@angband.pl>
References: <BL0PR0901MB312408FDAE2CEE57F4FB97BFC8F50@BL0PR0901MB3124.namprd09.prod.outlook.com>
 <20200320124625.GC32403@angband.pl>
 <CAA2GJqWh3ozpCJ7cngBvPKNEsrismWF31QJM-sdDZrBk69cJiw@mail.gmail.com>
 <20200320144123.GA6554@angband.pl>
Message-ID: <10828d7b-80a5-6282-4ef4-7ed075fde75a@it.aoyama.ac.jp>

On 20/03/2020 23:41, Adam Borowski via Unicode wrote:

> Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF or
> U+11000..U+7FFFFFFF (or possibly even up to 2?? or 2??), which has its uses
> but is not well-formed Unicode.

This would definitely no longer be UTF-8!   Martin.


From unicode at unicode.org  Sat Mar 21 12:13:40 2020
From: unicode at unicode.org (Doug Ewell via Unicode)
Date: Sat, 21 Mar 2020 11:13:40 -0600
Subject: Is the binaryness/textness of a data format a property?
Message-ID: <000001d5ffa4$11d30860$35791920$@ewellic.org>

Adam Borowski wrote:

> Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
> or U+11000..U+7FFFFFFF (or possibly even up to 2?? or 2??), which has
> its uses but is not well-formed Unicode.

I'd be interested in your elaboration on what these uses are.

--
Doug Ewell | Thornton, CO, US | ewellic.org


From unicode at unicode.org  Sat Mar 21 14:23:45 2020
From: unicode at unicode.org (Eli Zaretskii via Unicode)
Date: Sat, 21 Mar 2020 21:23:45 +0200
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <000001d5ffa4$11d30860$35791920$@ewellic.org> (message from Doug
 Ewell via Unicode on Sat, 21 Mar 2020 11:13:40 -0600)
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
Message-ID: <8336a1ecla.fsf@gnu.org>

> Date: Sat, 21 Mar 2020 11:13:40 -0600
> From: Doug Ewell via Unicode <unicode at unicode.org>
> 
> Adam Borowski wrote:
> 
> > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
> > or U+11000..U+7FFFFFFF (or possibly even up to 2?? or 2??), which has
> > its uses but is not well-formed Unicode.
> 
> I'd be interested in your elaboration on what these uses are.

Emacs uses some of that for supporting charsets that cannot be mapped
into Unicode.  GB18030 is one example of such charsets.  The internal
representation of characters in Emacs is UTF-8, so it uses 5-byte
UTF-8 like sequences to represent such characters.

From unicode at unicode.org  Sat Mar 21 14:33:18 2020
From: unicode at unicode.org (Doug Ewell via Unicode)
Date: Sat, 21 Mar 2020 13:33:18 -0600
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <8336a1ecla.fsf@gnu.org>
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org>
Message-ID: <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>

Eli Zaretskii wrote:

>>> Also, UTF-8 can carry more than Unicode -- for example,
>>> U+D800..U+DFFF or U+11000..U+7FFFFFFF (or possibly even up to 2?? or
>>> 2??), which has its uses but is not well-formed Unicode.
>>
>> I'd be interested in your elaboration on what these uses are.
>
> Emacs uses some of that for supporting charsets that cannot be mapped
> into Unicode.  GB18030 is one example of such charsets.  The internal
> representation of characters in Emacs is UTF-8, so it uses 5-byte
> UTF-8 like sequences to represent such characters.

When 137,468 private-use characters aren't enough?

I thought the whole premise of GB18030 was that it was Unicode mapped into a GB2312 framework. What characters exist in GB18030 that don't exist in Unicode, and have they been proposed for Unicode yet, and why was none of the PUA space considered appropriate for that in the meantime?

--
Doug Ewell | Thornton, CO, US | ewellic.org


From unicode at unicode.org  Sat Mar 21 15:26:24 2020
From: unicode at unicode.org (Eli Zaretskii via Unicode)
Date: Sat, 21 Mar 2020 22:26:24 +0200
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org> (doug@ewellic.org)
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org> <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
Message-ID: <831rple9ov.fsf@gnu.org>

> From: "Doug Ewell" <doug at ewellic.org>
> Cc: <unicode at unicode.org>
> Date: Sat, 21 Mar 2020 13:33:18 -0600
> 
> > Emacs uses some of that for supporting charsets that cannot be mapped
> > into Unicode.  GB18030 is one example of such charsets.  The internal
> > representation of characters in Emacs is UTF-8, so it uses 5-byte
> > UTF-8 like sequences to represent such characters.
> 
> When 137,468 private-use characters aren't enough?

Why is that relevant to the issue at hand?

> I thought the whole premise of GB18030 was that it was Unicode mapped into a GB2312 framework. What characters exist in GB18030 that don't exist in Unicode, and have they been proposed for Unicode yet

I don't remember off hand, but last time I looked at GB18030, there
were a lot of them not in Unicode.

> and why was none of the PUA space considered appropriate for that in the meantime?

Because many fonts already use them?  I don't really know why it was
decided to use codepoints above 0x1FFFFF, it's just that this is how
Emacs works for quite some time.  You asked for examples of usage, and
I provided one.

From unicode at unicode.org  Sat Mar 21 15:38:24 2020
From: unicode at unicode.org (Julian Bradfield via Unicode)
Date: Sat, 21 Mar 2020 20:38:24 +0000 (GMT)
Subject: Is the binaryness/textness of a data format a property?
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org>
Message-ID: <slrnr7cuq0.r0k.jcb@home.stevens-bradfield.com>

On 2020-03-21, Eli Zaretskii via Unicode <unicode at unicode.org> wrote:
>> Date: Sat, 21 Mar 2020 11:13:40 -0600
>> From: Doug Ewell via Unicode <unicode at unicode.org>
>> 
>> Adam Borowski wrote:
>> 
>> > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
>> > or U+11000..U+7FFFFFFF (or possibly even up to 2?? or 2??), which has
>> > its uses but is not well-formed Unicode.
>> 
>> I'd be interested in your elaboration on what these uses are.
>
> Emacs uses some of that for supporting charsets that cannot be mapped
> into Unicode.  GB18030 is one example of such charsets.  The internal
> representation of characters in Emacs is UTF-8, so it uses 5-byte
> UTF-8 like sequences to represent such characters.

My own (now >10 year old) Unicode adaptation of XEmacs does the same,
even for charsets that can be mapped into Unicode. To ensure complete
backward compatibility, it distinguishes "legacy" charsets from Unicode,
and only does conversion when requested.


From unicode at unicode.org  Sat Mar 21 15:57:42 2020
From: unicode at unicode.org (Doug Ewell via Unicode)
Date: Sat, 21 Mar 2020 14:57:42 -0600
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <831rple9ov.fsf@gnu.org>
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org> <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
 <831rple9ov.fsf@gnu.org>
Message-ID: <000a01d5ffc3$5dfd3ac0$19f7b040$@ewellic.org>

Eli Zaretskii wrote:

>> When 137,468 private-use characters aren't enough?
>
> Why is that relevant to the issue at hand?

You're right. I did ask what the uses of non-standard UTF-8 were, and you gave me an example.

> I don't remember off hand, but last time I looked at GB18030, there
> were a lot of them not in Unicode.

I'd forgotten that there were still about two dozen GB18030 characters mapped, more or less officially, into the Unicode PUA. But again, I changed the subject. Sorry about that.

--
Doug Ewell | Thornton, CO, US | ewellic.org


From unicode at unicode.org  Sat Mar 21 19:31:31 2020
From: unicode at unicode.org (Richard Wordingham via Unicode)
Date: Sun, 22 Mar 2020 00:31:31 +0000
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org>
 <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
Message-ID: <20200322003131.657f1f23@JRWUBU2>

On Sat, 21 Mar 2020 13:33:18 -0600
Doug Ewell via Unicode <unicode at unicode.org> wrote:

> Eli Zaretskii wrote:

> > Emacs uses some of that for supporting charsets that cannot be
> > mapped into Unicode.  GB18030 is one example of such charsets.  The
> > internal representation of characters in Emacs is UTF-8, so it uses
> > 5-byte UTF-8 like sequences to represent such characters.  

> When 137,468 private-use characters aren't enough?

But they aren't private use!  I haven't made any agreement with anyone
about using them.

Additionally, just as some people seem to think that stray UTF-16 code
units should be supported (and occasionally declaring UTF-8
implementations of Unicode standard algorithms to be automatically
non-compliant), there is a case for supporting stray UTF-8 code units.
Emacs supports the full range of 8-bit byte values - 128 unified with
ASCII and the other 128 with high bit set.

> What characters exist in GB18030 that don't
> exist in Unicode, and have they been proposed for Unicode yet, and
> why was none of the PUA space considered appropriate for that in the
> meantime?

Doesn't GB18030 appropriate some of the PUA for Tibetan (and quite
possibly other complex scripts)?  I haven't looked up how Emacs
handles this. 

Richard.

From unicode at unicode.org  Sun Mar 22 13:56:52 2020
From: unicode at unicode.org (Markus Scherer via Unicode)
Date: Sun, 22 Mar 2020 11:56:52 -0700
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org>
 <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
Message-ID: <CAN49p6q1-vtuDimGNYReEc85CBxjBTAzEcy0g690NH4gk8OD+A@mail.gmail.com>

On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode <unicode at unicode.org>
wrote:

> I thought the whole premise of GB18030 was that it was Unicode mapped into
> a GB2312 framework. What characters exist in GB18030 that don't exist in
> Unicode, and have they been proposed for Unicode yet, and why was none of
> the PUA space considered appropriate for that in the meantime?
>

My memory of GB18030 is that its code space has 1.6M code points, of which
1.1M are a permutation of Unicode. For the rest you would have to go beyond
the Unicode code space for 1:1 round-trip mappings.

Just please don't call it UTF-8.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200322/6efdafb0/attachment.html>

From unicode at unicode.org  Sun Mar 22 18:29:03 2020
From: unicode at unicode.org (=?utf-8?B?TWFydGluIEouIETDvHJzdA==?= via Unicode)
Date: Sun, 22 Mar 2020 23:29:03 +0000
Subject: Is the binaryness/textness of a data format a property?
In-Reply-To: <CAN49p6q1-vtuDimGNYReEc85CBxjBTAzEcy0g690NH4gk8OD+A@mail.gmail.com>
References: <000001d5ffa4$11d30860$35791920$@ewellic.org>
 <8336a1ecla.fsf@gnu.org> <000701d5ffb7$93544850$b9fcd8f0$@ewellic.org>
 <CAN49p6q1-vtuDimGNYReEc85CBxjBTAzEcy0g690NH4gk8OD+A@mail.gmail.com>
Message-ID: <3eb6a9a0-ee7d-2650-157c-9ed02835edd8@it.aoyama.ac.jp>

On 23/03/2020 03:56, Markus Scherer via Unicode wrote:
> On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode <unicode at unicode.org>
> wrote:
> 
>> I thought the whole premise of GB18030 was that it was Unicode mapped into
>> a GB2312 framework. What characters exist in GB18030 that don't exist in
>> Unicode, and have they been proposed for Unicode yet, and why was none of
>> the PUA space considered appropriate for that in the meantime?
>>
> 
> My memory of GB18030 is that its code space has 1.6M code points, of which
> 1.1M are a permutation of Unicode. For the rest you would have to go beyond
> the Unicode code space for 1:1 round-trip mappings.

This matches my recollection. What's more, there are no characters 
allocated in the parts of the GB 18030 codespace that doesn't map to 
Unicode, and there is as far as I understand no plan to use that space. 
It's just there because that was the most straightforward way to extend 
GB 2312/GBK.

Regards,   Martin.


From unicode at unicode.org  Mon Mar 23 17:29:57 2020
From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode)
Date: Mon, 23 Mar 2020 22:29:57 +0000 (GMT)
Subject: Base character plus tag sequences (from RE: Is the
 binaryness/textness of a data format a property?)
Message-ID: <59f9f4cc.1054.17109844d3a.Webtop.71@btinternet.com>


Doug Ewell wrote:
> When 137,468 private-use characters aren't enough?
In my opinion, a base character plus tag sequence has the potential to 
be used for many large scale applications for the future.
A base character plus tag sequence encoding has the advantage over a 
Private Use Area encoding (except for a prompt experimental use or for 
some applications) that the encoding can be unique and thus 
interoperability is possible amongst people generally.

QID emoji is just the very start of applications, some not even dreamed 
of yet, for which a base character sequence encoding could be used.

Once restrictions of the result of a specific encoding of being only 
allowed to be a fixed image are removed, then new information technology 
applications will be possible within text streams.

There is the QID Emoji Public Review and issues like this can be 
explored there so that they will be before the Unicode Technical 
Committee when it assesses the responses to the public review.

In my response of Monday 2 March 2020 I put forward an idea that could 
allow the idea of QID emoji to proceed yet without the disadvantages.

No comment after that has been published as of the time of sending this 
post.

https://www.unicode.org/review/pri408/

Whatever your view on whether such ideas should be allowed to flourish 
and become mainstream in the future I opine that it would be good for 
there to be more responses to the public review so that as wide a range 
of views as possible are before the Unicode Technical Committee when it 
assesses the responses to the public review, not on just QID emoji as 
such but on whether the underlying method of encoding of a base 
character and tag character sequence for  large sets of items should be 
encouraged.

William Overington

Monday 23 March 2020


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20200323/da29d58d/attachment.html>

From unicode at unicode.org  Tue Mar 31 13:22:37 2020
From: unicode at unicode.org (wjgo_10009@btinternet.com via Unicode)
Date: Tue, 31 Mar 2020 19:22:37 +0100 (BST)
Subject: How is meaning changed  by context and typgraphy -  in art, emoji
 and language
Message-ID: <2ccfdccc.1f4b.17131d4bc09.Webtop.73@btinternet.com>

I received a circulated email from MoMA, the Museum of Modern Art in New 
York. I am, at my request, on their mailing list.

There is a link to a web page.

https://www.moma.org/magazine/articles/257

There is a video embedded in the web page, 8 minutes.

I watched the video and found it interesting.

There is one part where two identical images each have a different 
title.

I noticed that both titles were in English.

With typography today it has become almost obligatory these days for a 
proposal for a new emoji character to become encoded, for the emoji 
character to be suggested as having multiple possible meanings, possibly 
linked to context, or maybe just anyway.

The beginnings of this phenomenon and the problems of ambiguity of 
meaning of emoji characters was discussed in a talk at the Unicode 
conference in 2015.

https://www.youtube.com/watch?v=9ldSVbXbjl4

There was mention of the possibility of "precise emoji".

Yet these days  imprecision of emoji meaning has become widespread. Yet 
has the possibility of QID emoji brought back the possibility of precise 
emoji? Decoding could be to an image, or to language-localized speech or 
language-localized text, or even all three at once. Yet only if QID 
emoji are allowed to flourish, perhaps after a few careful modifications 
to the original proposal so as to minimize, or at least limit, the 
possibility of encoding chaos.

I have long been fascinated by what I regard as subtle changes of 
meaning that setting a piece of text in different fonts produces, though 
some other people opine that the meaning is unchanged, regardless of the 
font.

  Also, can some meanings not be expressed from one language to another? 
If so, is that due to the nature of the languages or the culture where 
the original text was produced, or some of each. Does the general shape 
of the way that a particular script has developed reflect, or influence, 
the original literature written in that script? Do words that rhyme in 
one language produce imagery that does not arise in a language where 
their translations do not rhyme? For example, boaco and erinaco rhyme In 
Esperanto, yet their translations in English, reindeer and hedgehog, do 
not rhyme.

The art works in the MoMA video also reminded me of something that was 
in this mailing list probably in the early 2000s.

The post was about translations linked to an art project.

It was an art project about some orange blocks and people were taking 
photographs of art works where one of the orange blocks was presented in 
some context.

Maybe it was a student project, I don't know.

I have looked on the web and thus far found nothing about it, not even 
the original post in this mailing list thus far.

Since then technology has changed a lot, much more is now possible for 
more people. There are now widespread emoji, there is Google street 
view, and so on.

New art possibilities.

Does anyone else remember the orange blocks please? Maybe an interesting 
stepping stone in the history of art.

William Overington

Tuesday 31 March 2020