From mark at kli.org Mon Apr 1 16:39:04 2024 From: mark at kli.org (Mark E. Shoulson) Date: Mon, 1 Apr 2024 17:39:04 -0400 Subject: HEBREW HE-WITH-ADNY-INSIDE Message-ID: Looking waaaay back to my opus (with Michael Everson) of 1998, http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1740/n1740.htm, I call to attention one particular case mentioned there: the case where the second HEBREW LETTER HE of the Tetragrammaton is made very wide and another Holy Name (Adonay, ALEF-DALET-NUN-YOD) is printed in smaller letters inside it.? As mentioned last century, this is even now (well, then) commonly met with, especially in Sephardic prayer books. I mention it because I've found a bunch of professional Hebrew fonts which have a glyph for this special character.? Take a look at any one of many (but not all) of the offerings of the Samtype Foundry at https://www.myfonts.com/collections/samtype-foundry and you'll see what I mean.? Sometimes it's visible in the sample image, sometimes it isn't even though it's in the font.? They seem to be placing the glyph at codepoint U+FB50, which is ARABIC LETTER ALEF WASLA ISOLATED FORM, probably because it's the next character after the extended Hebrew code-block that ends at U+FB4F HEBREW LIGATURE ALEF LAMED and because, being in an Arabic codeblock, it has RTL directionality (while the PUA I think has LTR directionality, which is most inconvenient.) So it seems that this really is a thing being used by typefounders even now.? Probably should be encoded, yes?? My rationale from 1998 of encoding the Tetragrammaton as a glyph in itself was apparently not accepted, though after a later paper, https://unicode.org/L2/L2015/15092-hebew-nomina-sacra.pdf and some discussion, the YOD TRIANGLE U+05EF was encoded.? Perhaps this should be too?? I guess as a variant of HE perhaps?? (the name in the subject-header is not meant as a serious proposal for the glyph-name, though this letter is actually serious, despite the date.) ~mark From ruben at arakelyan.uk Wed Apr 3 05:14:17 2024 From: ruben at arakelyan.uk (Ruben Arakelyan) Date: Wed, 3 Apr 2024 11:14:17 +0100 Subject: Deciding between a character and emoji proposal Message-ID: I was idly perusing the Unicode character set for religious cross symbols recently and noticed that the Armenian cross (https://en.wikipedia.org/wiki/Armenian_Cross) is not included either as a character or as an emoji. I am minded to submit a proposal but am hoping beforehand to get some kind of steer (however informal and non-binding) about whether it would be better to propose it as a character or emoji. There are a number of religious symbols encoded as characters, including various Christian cross styles from different denominations. These may have been ?grandfathered? in to the current guidelines, however, since they don?t always seem to fit the categorisation of being used primarily in runs of text etc. There are also two cross styles (the Latin and Orthodox crosses) that are encoded as emoji, which would to my mind seem more appropriate, although these are duplicated from existing character encodings. I am trying to work out whether my new proposal would fit better in the larger existing list of crosses encoded as characters, or if the steer now is that these types of symbols are better encoded as emoji. Any thoughts would be greatly appreciated. Ruben Arakelyan From pgcon6 at msn.com Wed Apr 3 21:40:00 2024 From: pgcon6 at msn.com (Peter Constable) Date: Thu, 4 Apr 2024 02:40:00 +0000 Subject: What is the ASCII table and How Do You Use It? Message-ID: I happened upon this article, apparently published a (relatively) recent nine months ago. It's written as though ASCII is supposed to be news to many people. What Is the ASCII Table and How Do You Use It? (msn.com) It strikes me as something meant for April Fool's Day, but the dates don't line up. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From olopierpa at gmail.com Wed Apr 3 22:12:22 2024 From: olopierpa at gmail.com (Pierpaolo Bernardi) Date: Thu, 4 Apr 2024 05:12:22 +0200 Subject: What is the ASCII table and How Do You Use It? In-Reply-To: References: Message-ID: It's not like nowadays people are born with innate knowledge of ASCII just because it's old... On Thu, Apr 4, 2024 at 4:43?AM Peter Constable via Unicode wrote: > > I happened upon this article, apparently published a (relatively) recent nine months ago. It?s written as though ASCII is supposed to be news to many people. > > > > What Is the ASCII Table and How Do You Use It? (msn.com) > > > > It strikes me as something meant for April Fool?s Day, but the dates don?t line up. > > > > > > > > Peter From doug at ewellic.org Wed Apr 3 23:07:31 2024 From: doug at ewellic.org (Doug Ewell) Date: Thu, 4 Apr 2024 04:07:31 +0000 Subject: What is the ASCII table and How Do You Use It? In-Reply-To: References: Message-ID: Peter Constable wrote: > I happened upon this article, apparently published a (relatively) > recent nine months ago. It?s written as though ASCII is supposed to be > news to many people. > > https://www.msn.com/en-us/news/technology/what-is-the-ascii-table-and-how-do-you-use-it/ar-AA1cGkw8?ocid=msedgntp&pc=DCTS&cvid=16572016691745b6984a140583a7ce3f&ei=57 Probably my least favorite part of such tables, both forty years ago and today, is their blind repetition of cryptic abbreviations for C0 control functions like SYN and ETB and EM, with no explanation of their expansion or use. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From sosipiuk at gmail.com Thu Apr 4 00:36:33 2024 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Thu, 04 Apr 2024 05:36:33 +0000 Subject: What is the ASCII table and How Do You Use It? In-Reply-To: References: Message-ID: <1712208706580.1334668722.4039834894@gmail.com> There's a very appropriate XKCD for this: https://xkcd.com/2501/ On Wednesday, 03 April 2024, 22:40:00 (-04:00), Peter Constable via Unicode wrote: I happened upon this article, apparently published a (relatively) recent nine months ago. It?s written as though ASCII is supposed to be news to many people. What Is the ASCII Table and How Do You Use It? (msn.com) It strikes me as something meant for April Fool?s Day, but the dates don?t line up. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at lindenbergsoftware.com Thu Apr 4 06:28:59 2024 From: unicode at lindenbergsoftware.com (Norbert Lindenberg) Date: Thu, 4 Apr 2024 13:28:59 +0200 Subject: What is the ASCII table and How Do You Use It? In-Reply-To: References: Message-ID: <3236FF5F-8B6D-427E-AE35-AA1C387AA921@lindenbergsoftware.com> It?s clickbait: https://en.wikipedia.org/wiki/Clickbait With the proliferation of LLMs, expect a lot more stuff like this. Norbert > On Apr 4, 2024, at 04:40, Peter Constable via Unicode wrote: > > I happened upon this article, apparently published a (relatively) recent nine months ago. It?s written as though ASCII is supposed to be news to many people. > What Is the ASCII Table and How Do You Use It? (msn.com) > It strikes me as something meant for April Fool?s Day, but the dates don?t line up. > Peter From wjgo_10009 at btinternet.com Thu Apr 4 02:14:55 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 4 Apr 2024 08:14:55 +0100 (BST) Subject: What is the ASCII table and How Do You Use It? In-Reply-To: References: Message-ID: <1bc6023c.9f9c.18ea7f5ebc1.Webtop.117@btinternet.com> It is interesting that the printing ASCII characters are the characters that are the underlying characters of most of the tag characters of plane 14. ?So any use of tags in present or future Unicode encoding is effectively restricted to ASCII characters. ? Reading about ASCII in this thread reminded me of when, in the early 1990s, I devised a way to express Esperanto text using 7-bit ASCII. ?I think I got it working, just locally, programmned in Pascal. ? As far as I am aware the system has never been applied in practice and has been overtaken by technological advances, yet perhaps will have some interest as to what was done many years ago with the technology available at the time. ? Yet one aspect of that exercise in encoding information remains, namely the concept of the software unicorn. I wrote about software unicorns in a story that I wrote in 1998, the story detailing the encoding system that I mentioned earlier in this post. The story placed on the web at that time and still there now. ?? The web page includes two illustrations of software unicorns, constructed by adapting clip art from Microsoft Office at that time. I ungrouped one clip art picture so as to place a software unicorn between two layers of the clip art image. I used the PowerPoint program to do that. ? http://www.users.globalnet.co.uk/~ngo/euto0008.htm ? There was also a software unicorn screensaver, also constructed by adapting Microsoft Office clip art. ? http://www.users.globalnet.co.uk/~ngo/euto2001.htm ? It was great as the software unicorns went across the screen slowly, each at its own pace, so there was an everchanging display. I have not been able to get it to work on later computers. ? Yet ASCII still underlies much of the internet. For example, some years later I wrote the following about software unicorns. The text includes an accented character, an a circumflex, and I am not sure whether this email system will send that character correctly, and if it does, will the accented character appear correctly in the record of this post in the archive of the Unicode mailing list. Hopefully it will all work, but if it is not ASCII then there is sometimes the possibility that things will not work properly. ? A castle of software in imagery seen as a ch?teau in turquoise and three shades of green: yet a castle of software can fall to the ground if over its drawbridge their golden hooves pound ? More recently, I have included the software unicorns in my first novel, mostly in Chapter 21, and in a song in Chapter 60. ? I am using sequences of ASCII characters, each code being an exclamation mark followed by digits, in my research on communicating through the language barrier in some particular circumstances. I am deliberately only using ASCII characters for this as I hope that at some future time the Unicode Technical Committee will encode what are at present ASCII codes as tag sequences in Unicode thereby helping my research to become widely applied as such an encoding will add interoperability and avoid ambiguity and avoid any concerns over using the system related to concerns about perceived intellectual property issues. ? William Overington ? Thursday 4 April 2024 ? http://www.users.globalnet.co.uk/~ngo/ ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Apr 5 15:31:44 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 5 Apr 2024 21:31:44 +0100 (BST) Subject: Tags and emoji Message-ID: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> Unicode regularly encodes some emoji, each with its own code point. Yet, as there is a quota each year, some acceptable requested emoji do not become encoded at all. -- Tag sequences are used for the flags of England, Wales, and Scotland. -- The QID Emoji proposal, which was not accepted, possibly because it would have meant some emoji being given Unicode status without prior oversight by the Unicode Technical Committee, nevertheless did include a good encoding format that applied tag digit characters. -- So I write to seek opinions please on whether it would be a good idea that that tag format could be applied so as to uniquely encode all those acceptable emoji that have been formally proposed to Unicode Inc. yet have not been selected for the annual quota. The emoji thus encoded might not become implemented by mainstream platform businesses yet there could be good opportunities for independent artists and fontmakers and would go some way to bringing a good result to the proposers of otherwise unencoded emoji proposals. If there were a practice that fonts supporting in whole or in part such emoji had visible glyphs for the tag digit characters, an unsupported tag emoji would be indicated by the displayed digit sequence. This would be a far better solution than using a Private Use Area encoding. William Overington Friday 5 April 2024 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From list+unicode at jdlh.com Fri Apr 5 16:38:36 2024 From: list+unicode at jdlh.com (Jim DeLaHunt) Date: Fri, 5 Apr 2024 14:38:36 -0700 Subject: Tags and emoji In-Reply-To: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> References: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> Message-ID: <1de088aa-5bbb-4c33-91fb-854a189f533b@jdlh.com> On 2024-04-05 13:31, William_J_G Overington via Unicode wrote: > > ?So I write to seek opinions please on whether it would be a good idea > that that tag format could be applied so as to uniquely encode all > those acceptable emoji that have been formally proposed to Unicode > Inc. yet have not been selected for the annual quota. > > The emoji thus encoded might not become implemented by mainstream > platform businesses yet there could be good opportunities for > independent artists and fontmakers and would go some way to bringing a > good result to the proposers of otherwise unencoded emoji proposals. > If there were a practice that fonts supporting in whole or in part > such emoji had visible glyphs for the tag digit characters, an > unsupported tag emoji would be indicated by the displayed digit sequence.? > In my humble opinion, no, this would not be a good idea. The underlying issue is that people want to mix pictures with their text. You are pointing out all the ways that using the mechanism of text to deliver the pictures to the text stream is difficult. One must persuade the UTC to encode the picture as an emoji. Platform businesses must support the emoji in fonts and input methods. Font makers must add the picture to their fonts. Users must learn that the picture is available as an emoji, and use it. Because of all this external cost, the UTC encoding process appropriately includes high barriers to entry. Why not take all that energy, and put it towards encouraging application developers to provide ways to mix pictures as pictures into the text stream?? Sometimes these pictures are called "seals", or "stamps", or "reactions". Once the application developer allows users to insert arbitrary pictures into the text stream, then users can directly ask the independent artists for images matching the proposed emoji which did not meet the annual quota, and use them immediately. There need be no wait for an encoding process, or platform support, or anything else. Why is it so terribly important to use the mechanism of text to deliver pictures in text, instead of using a application-based mechanism of mixed text and pictures? Best regards, ???? ?Jim DeLaHunt, Vancouver, Canada -- . --Jim DeLaHunt,jdlh at jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, B.C., Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Sat Apr 6 16:27:28 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 6 Apr 2024 22:27:28 +0100 (BST) Subject: Images in plain text (from Re: Tags and emoji) In-Reply-To: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> References: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> Message-ID: <73075d0f.c526.18eb54f2c10.Webtop.117@btinternet.com> Jim DeLaHunt wrote as follows. > Why not take all that energy, and put it towards encouraging > application developers to provide ways to mix pictures as pictures > into the text stream? In recent years there have been a few suggestions (by others, not me) in documents in the Unicode Technical Committee Document Register for such systems. If I remember correctly, at least one involved using tag characters. As far as I am aware, none have gone forward. Reading your post I remembered that over twenty years ago I put forward in this mailing list a suggestion for what I called a .uof file. Trying to find it, as yet unsuccessfully, I found that .uof is now used as a suffix in an entirely different system, an office software system, so if my idea were to become implemented a different file extension would be needed. If I remember correctly, my .uof file suggestion was such that if the plain text file that it accompanied had n uses of the character U+FFFC OBJECT REPLACEMENT CHARACTER then the .uof file would have n lines of text, each line of text containing the name of a graphics file, either just a file name for a local file or a URL (Uniform Resource Locator) for a file obtainable from the web, listed in the order that the corresponding U+FFFC character for the graphics file appeared in the plain text file that the .uof file accompanied. Page 33 of https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf has some notes about U+FFFC. In those days the Unicode Technical Committe Document Register was not publicly available. After it became publicly available to read I remember that I found that my suggestion of a .uof file had been discussed at a meeting of the Unicode Technical Committee. So there are various ways to include graphics in, or accompaying and linked to, plain text file content that have been suggested. If there is a will by the Unicode Technical Committee to go forward and have such a capability agreed and specified in a Unicode Technical Specification then there are various ideas for achieveing a result that have already been put forward, and other ideas maight well be devised too. As for the possibility of me encouraging application developers to develop systems, well, I am retired and I could not credibly approach them suggesting they spend time and effort implementing my ideas unless I were in a position to pay them to do it. Yet if Unicode Inc. encoded the best system that can be devised, then maybe application developers would choose to take up that system and implement it, and progress would be achieved. > Why is it so terribly important to use the mechanism of text to > deliver pictures in text, instead of using a application-based > mechanism of mixed text and pictures? As far as I am aware, it is a matter of interoperability amongst various platforms and the fact that emoji are used inline with text, at various places within the text, not all together in the style of a diagram accompanying the text. William Overington Saturday 6 April 2024 ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sun Apr 7 04:34:15 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 7 Apr 2024 02:34:15 -0700 Subject: Images in plain text (from Re: Tags and emoji) In-Reply-To: <73075d0f.c526.18eb54f2c10.Webtop.117@btinternet.com> References: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> <73075d0f.c526.18eb54f2c10.Webtop.117@btinternet.com> Message-ID: Except for the narrow case of popular pictographs (read emoji) there's been a clear consensus that including images in a message or document is best realized with out-of-band information (or rich text formats that are not plain text, whether or not they have a plain-text source code format). Anything short of a multi-vendor effort is unlikely to change that status quo, so all these schemes represent curiosities at best (and discussing them mainly has entertainment value, if that). A./ On 4/6/2024 2:27 PM, William_J_G Overington via Unicode wrote: > > Jim DeLaHunt wrote as follows. > > > Why not take all that energy, and put it towards encouraging application developers to > provide ways to mix pictures as pictures into the text stream? > > In recent years there have been a few suggestions (by others, not me) > in documents in the Unicode Technical Committee Document Register for > such systems. If I remember correctly, at least one involved using tag > characters. As far as I am aware, none have gone forward. > > Reading your post I remembered that over twenty years ago I put > forward in this mailing list a suggestion for what I called a .uof > file. Trying to find it, as yet unsuccessfully, I found that .uof is > now used as a suffix in an entirely different system, an office > software system, so if my idea were to become implemented a different > file extension would be needed. > > If I remember correctly, my .uof file suggestion was such that if the > plain text file that it accompanied had n uses of the character > > U+FFFC OBJECT REPLACEMENT CHARACTER > > then the .uof file would have n lines of text, each line of text > containing the name of a graphics file, either just a file name for a > local file or a URL (Uniform Resource Locator) for a file obtainable > from the web, listed in the order that the corresponding U+FFFC > character for the graphics file appeared in the plain text file that > the .uof file accompanied. > > Page 33 of https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf > has some > notes about U+FFFC. > > In those days the Unicode Technical Committe Document Register was not > publicly available. After it became publicly available to read I > remember that I found that my suggestion of a .uof file had been > discussed at a meeting of the Unicode Technical Committee. > > So there are various ways to include graphics in, or accompaying and > linked to, plain text file content that have been suggested. > > If there is a will by the Unicode Technical Committee to go forward > and have such a capability agreed and specified in a Unicode Technical > Specification then there are various ideas for achieveing a result > that have already been put forward, and other ideas maight well be > devised too. > > As for the possibility of me encouraging application developers to > develop systems, well, I am retired and I could not credibly approach > them suggesting they spend time and effort implementing my ideas > unless I were in a position to pay them to do it. Yet if Unicode Inc. > encoded the best system that can be devised, then maybe application > developers would choose to take up that system and implement it, and > progress would be achieved. > > > Why is it so terribly important to use the mechanism of text to deliver pictures in text, > instead of using a application-based mechanism of mixed text and pictures? > > As far as I am aware, it is a matter of interoperability amongst > various platforms and the fact that emoji are used inline with text, > at various places within the text, not all together in the style of a > diagram accompanying the text. > > William Overington > > Saturday 6 April 2024 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwidion at gmail.com Sun Apr 7 20:44:16 2024 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Sun, 7 Apr 2024 22:44:16 -0300 Subject: Images in plain text (from Re: Tags and emoji) In-Reply-To: References: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> <73075d0f.c526.18eb54f2c10.Webtop.117@btinternet.com> Message-ID: On Sun, Apr 7, 2024 at 6:39?AM Asmus Freytag via Unicode wrote: > > Except for the narrow case of popular pictographs (read emoji) there's been a clear consensus that including images in a message or document is best realized with out-of-band information (or rich text formats that are not plain text, whether or not they have a plain-text source code format). Actually, for certain kinds of image all the "out of band" information needed is that the text should be rendered with a monospaced font, with no spacing around characters - with that, images can be rendered using block characters, 1/4 blocks and sextant characters included in the Vintage block, or even take in account glyph shapes for interesting "ASCII art". Oh, yes, this also needs control characters for at least "new line" and "carriage return" to go in band (not the case for HTML rendered text for example) > > Anything short of a multi-vendor effort is unlikely to change that status quo, so all these schemes represent curiosities at best (and discussing them mainly has entertainment value, if that). > > A./ > > > On 4/6/2024 2:27 PM, William_J_G Overington via Unicode wrote: > > Jim DeLaHunt wrote as follows. > > > Why not take all that energy, and put it towards encouraging application developers to provide ways to mix pictures as pictures into the text stream? > > In recent years there have been a few suggestions (by others, not me) in documents in the Unicode Technical Committee Document Register for such systems. If I remember correctly, at least one involved using tag characters. As far as I am aware, none have gone forward. > > Reading your post I remembered that over twenty years ago I put forward in this mailing list a suggestion for what I called a .uof file. Trying to find it, as yet unsuccessfully, I found that .uof is now used as a suffix in an entirely different system, an office software system, so if my idea were to become implemented a different file extension would be needed. > > If I remember correctly, my .uof file suggestion was such that if the plain text file that it accompanied had n uses of the character > > U+FFFC OBJECT REPLACEMENT CHARACTER > > then the .uof file would have n lines of text, each line of text containing the name of a graphics file, either just a file name for a local file or a URL (Uniform Resource Locator) for a file obtainable from the web, listed in the order that the corresponding U+FFFC character for the graphics file appeared in the plain text file that the .uof file accompanied. > > Page 33 of https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf has some notes about U+FFFC. > > In those days the Unicode Technical Committe Document Register was not publicly available. After it became publicly available to read I remember that I found that my suggestion of a .uof file had been discussed at a meeting of the Unicode Technical Committee. > > So there are various ways to include graphics in, or accompaying and linked to, plain text file content that have been suggested. > > If there is a will by the Unicode Technical Committee to go forward and have such a capability agreed and specified in a Unicode Technical Specification then there are various ideas for achieveing a result that have already been put forward, and other ideas maight well be devised too. > > As for the possibility of me encouraging application developers to develop systems, well, I am retired and I could not credibly approach them suggesting they spend time and effort implementing my ideas unless I were in a position to pay them to do it. Yet if Unicode Inc. encoded the best system that can be devised, then maybe application developers would choose to take up that system and implement it, and progress would be achieved. > > > Why is it so terribly important to use the mechanism of text to deliver pictures in text, instead of using a application-based mechanism of mixed text and pictures? > > As far as I am aware, it is a matter of interoperability amongst various platforms and the fact that emoji are used inline with text, at various places within the text, not all together in the style of a diagram accompanying the text. > > William Overington > > Saturday 6 April 2024 > > > > > > From asmusf at ix.netcom.com Mon Apr 8 12:07:06 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 8 Apr 2024 10:07:06 -0700 Subject: Images in plain text (from Re: Tags and emoji) In-Reply-To: References: <35545470.bbb1.18eaff5ca02.Webtop.117@btinternet.com> <73075d0f.c526.18eb54f2c10.Webtop.117@btinternet.com> Message-ID: On 4/7/2024 6:44 PM, Joao S. O. Bueno via Unicode wrote: > On Sun, Apr 7, 2024 at 6:39?AM Asmus Freytag via Unicode > wrote: >> Except for the narrow case of popular pictographs (read emoji) there's been a clear consensus that including images in a message or document is best realized with out-of-band information (or rich text formats that are not plain text, whether or not they have a plain-text source code format). > Actually, for certain kinds of image all the "out of band" > information needed is that the text should be rendered with a > monospaced font, with no spacing around characters - with that, images > can be rendered using block characters, ... there's always the compatibility / legacy exception for everything in Unicode :) But invoking that also means no future items will be encoded as there's no existing legacy to match. A./ > 1/4 blocks and sextant characters included in the Vintage block, or > even take in account glyph shapes for interesting "ASCII art". Oh, > yes, this also needs control characters for at least "new line" and > "carriage return" to go in band (not the case for HTML rendered text > for example) > > >> Anything short of a multi-vendor effort is unlikely to change that status quo, so all these schemes represent curiosities at best (and discussing them mainly has entertainment value, if that). >> >> A./ >> >> >> On 4/6/2024 2:27 PM, William_J_G Overington via Unicode wrote: >> >> Jim DeLaHunt wrote as follows. >> >>> Why not take all that energy, and put it towards encouraging application developers to provide ways to mix pictures as pictures into the text stream? >> In recent years there have been a few suggestions (by others, not me) in documents in the Unicode Technical Committee Document Register for such systems. If I remember correctly, at least one involved using tag characters. As far as I am aware, none have gone forward. >> >> Reading your post I remembered that over twenty years ago I put forward in this mailing list a suggestion for what I called a .uof file. Trying to find it, as yet unsuccessfully, I found that .uof is now used as a suffix in an entirely different system, an office software system, so if my idea were to become implemented a different file extension would be needed. >> >> If I remember correctly, my .uof file suggestion was such that if the plain text file that it accompanied had n uses of the character >> >> U+FFFC OBJECT REPLACEMENT CHARACTER >> >> then the .uof file would have n lines of text, each line of text containing the name of a graphics file, either just a file name for a local file or a URL (Uniform Resource Locator) for a file obtainable from the web, listed in the order that the corresponding U+FFFC character for the graphics file appeared in the plain text file that the .uof file accompanied. >> >> Page 33 of https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf has some notes about U+FFFC. >> >> In those days the Unicode Technical Committe Document Register was not publicly available. After it became publicly available to read I remember that I found that my suggestion of a .uof file had been discussed at a meeting of the Unicode Technical Committee. >> >> So there are various ways to include graphics in, or accompaying and linked to, plain text file content that have been suggested. >> >> If there is a will by the Unicode Technical Committee to go forward and have such a capability agreed and specified in a Unicode Technical Specification then there are various ideas for achieveing a result that have already been put forward, and other ideas maight well be devised too. >> >> As for the possibility of me encouraging application developers to develop systems, well, I am retired and I could not credibly approach them suggesting they spend time and effort implementing my ideas unless I were in a position to pay them to do it. Yet if Unicode Inc. encoded the best system that can be devised, then maybe application developers would choose to take up that system and implement it, and progress would be achieved. >> >>> Why is it so terribly important to use the mechanism of text to deliver pictures in text, instead of using a application-based mechanism of mixed text and pictures? >> As far as I am aware, it is a matter of interoperability amongst various platforms and the fact that emoji are used inline with text, at various places within the text, not all together in the style of a diagram accompanying the text. >> >> William Overington >> >> Saturday 6 April 2024 >> >> >> >> >> >> From gtbot2007 at gmail.com Mon Apr 8 12:03:22 2024 From: gtbot2007 at gmail.com (Gabriel Tellez) Date: Mon, 8 Apr 2024 13:03:22 -0400 Subject: External Link Symbol Message-ID: Unicode rejected external link symbol... but then are left to use the wrong symbol on their own website. SMH [image: IMG_0047.jpeg] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_0047.jpeg Type: image/jpeg Size: 274504 bytes Desc: not available URL: From jameskass at code2001.com Mon Apr 8 13:37:46 2024 From: jameskass at code2001.com (James Kass) Date: Mon, 8 Apr 2024 18:37:46 +0000 Subject: External Link Symbol In-Reply-To: References: Message-ID: On 2024-04-08 5:03 PM, Gabriel Tellez via Unicode wrote: > Unicode rejected external link symbol... but then are left to use the > wrong symbol?on their own website. SMH https://steemit.com/unicode/@markgritter/why-did-unicode-reject-the-external-link-symbol The rejection reasoning quoted in the link above goes on to say that any future proposal for the external link symbol would be considered if it includes convincing evidence that the symbol is needed for plain-text interchange. From janelle.frazer at comcast.net Mon Apr 8 14:03:48 2024 From: janelle.frazer at comcast.net (Janelle Frazer) Date: Mon, 08 Apr 2024 12:03:48 -0700 Subject: External Link Symbol In-Reply-To: Message-ID: <84409fa5-3df8-4caf-8c93-64cba6bf66e4@email.android.com> An HTML attachment was scrubbed... URL: From jk at koremail.com Mon Apr 8 23:44:55 2024 From: jk at koremail.com (jk at koremail.com) Date: Tue, 09 Apr 2024 12:44:55 +0800 Subject: External Link Symbol In-Reply-To: References: Message-ID: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> This is of course an issue with any proposed character rejected on the grounds of insufficient usage that the proposal, fonts made, etc and the record thereof creates a certain amount of new usage.? On 2024-04-09 02:37, James Kass via Unicode wrote: > On 2024-04-08 5:03 PM, Gabriel Tellez via Unicode wrote: >> Unicode rejected external link symbol... but then are left to use the >> wrong symbol?on their own website. SMH > > https://steemit.com/unicode/@markgritter/why-did-unicode-reject-the-external-link-symbol > > The rejection reasoning quoted in the link above goes on to say that > any future proposal for the external link symbol would be considered > if it includes convincing evidence that the symbol is needed for > plain-text interchange. From marius.spix at web.de Tue Apr 9 02:09:03 2024 From: marius.spix at web.de (Marius Spix) Date: Tue, 9 Apr 2024 09:09:03 +0200 Subject: Aw: Re: External Link Symbol In-Reply-To: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> Message-ID: An HTML attachment was scrubbed... URL: From jk at koremail.com Tue Apr 9 02:48:08 2024 From: jk at koremail.com (jk at koremail.com) Date: Tue, 09 Apr 2024 15:48:08 +0800 Subject: Aw: Re: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> Message-ID: <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Yes, and other good reasons not accepted. Please do not take too seriously anything I post with a grinning face emoji. My comment was a variant of a "Never say 'never'!" type joke. On 2024-04-09 15:09, Marius Spix via Unicode wrote: > Unicode already has a link symbol (U+1F517) and there is also an emoji > ZWJ sequence for broken links (so-called red links) or unlink (U+26D3 > U+FE0F U+200D U+1F4A5), which uses the chains symbol (U+26D3) instead > of the link symbol (U+1F517) for some unknown reason. You can also use > the rightwards arrow (U+2192) which is used in encyclopedias for > references. > > Gesendet: Dienstag, 09. April 2024 um 06:44 Uhr > Von: "John Knightley via Unicode" > An: "James Kass" > Cc: unicode at corp.unicode.org > Betreff: Re: External Link Symbol > > This is of course an issue with any proposed character rejected on the > grounds of insufficient usage that the proposal, fonts made, etc and > the > record thereof creates a certain amount of new usage.? > > On 2024-04-09 02:37, James Kass via Unicode wrote: >> On 2024-04-08 5:03 PM, Gabriel Tellez via Unicode wrote: >>> Unicode rejected external link symbol... but then are left to use > the >>> wrong symbol on their own website. SMH >> >> > https://steemit.com/unicode/@markgritter/why-did-unicode-reject-the-external-link-symbol >> >> The rejection reasoning quoted in the link above goes on to say that >> any future proposal for the external link symbol would be considered >> if it includes convincing evidence that the symbol is needed for >> plain-text interchange. From gtbot2007 at gmail.com Wed Apr 10 07:27:48 2024 From: gtbot2007 at gmail.com (Gabriel Tellez) Date: Wed, 10 Apr 2024 08:27:48 -0400 Subject: Aw: Re: External Link Symbol In-Reply-To: <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: Talk ignoring the point I was trying to make. Unicode?s own website used the wrong character because they couldn?t use the correct one. On Tue, Apr 9, 2024 at 3:50?AM John Knightley via Unicode < unicode at corp.unicode.org> wrote: > Yes, and other good reasons not accepted. Please do not take too > seriously anything I post with a grinning face emoji. My comment was a > variant of a "Never say 'never'!" type joke. > > On 2024-04-09 15:09, Marius Spix via Unicode wrote: > > Unicode already has a link symbol (U+1F517) and there is also an emoji > > ZWJ sequence for broken links (so-called red links) or unlink (U+26D3 > > U+FE0F U+200D U+1F4A5), which uses the chains symbol (U+26D3) instead > > of the link symbol (U+1F517) for some unknown reason. You can also use > > the rightwards arrow (U+2192) which is used in encyclopedias for > > references. > > > > Gesendet: Dienstag, 09. April 2024 um 06:44 Uhr > > Von: "John Knightley via Unicode" > > An: "James Kass" > > Cc: unicode at corp.unicode.org > > Betreff: Re: External Link Symbol > > > > This is of course an issue with any proposed character rejected on the > > grounds of insufficient usage that the proposal, fonts made, etc and > > the > > record thereof creates a certain amount of new usage.? > > > > On 2024-04-09 02:37, James Kass via Unicode wrote: > >> On 2024-04-08 5:03 PM, Gabriel Tellez via Unicode wrote: > >>> Unicode rejected external link symbol... but then are left to use > > the > >>> wrong symbol on their own website. SMH > >> > >> > > > https://steemit.com/unicode/@markgritter/why-did-unicode-reject-the-external-link-symbol > >> > >> The rejection reasoning quoted in the link above goes on to say that > >> any future proposal for the external link symbol would be considered > >> if it includes convincing evidence that the symbol is needed for > >> plain-text interchange. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Wed Apr 10 08:57:43 2024 From: marius.spix at web.de (Marius Spix) Date: Wed, 10 Apr 2024 15:57:43 +0200 Subject: Aw: Re: Re: External Link Symbol In-Reply-To: <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: I had a look at the nonapproval list and wonder if SQUARE WITH SPECKLES FILL (which had been rejected due to mapping issues) will see another chance in the future, because it is used for colors (tinctures) in heraldry. Each of the following hatching is associated with a heraldic color: U+25A1 ? WHITE SQUARE = silver (white) U+25A4 ? SQUARE WITH HORIZONTAL FILL = blue U+25A5 ? SQUARE WITH VERTICAL FILL = red U+25A6 ? SQUARE WITH ORTHOGONAL CROSSHATCH FILL = black U+25A7 ? SQUARE WITH UPPER LEFT TO LOWER RIGHT FILL = green U+25A8 ? SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL = purple U+25A9 ? SQUARE WITH DIAGONAL CROSSHATCH FILL = murrey (a non-standard color) However, gold (yellow), which is one of the two metals and represented by a square with dotted fill is missing. Or should U+1F7E8 + U+FE0E ? LARGE YELLOW SQUARE + ?? VARIATION SELECTOR-15 be used for this purpose? ? ? ? Gesendet:?Dienstag, 09. April 2024 um 09:48 Uhr Von:?"John Knightley via Unicode" An:?"Marius Spix" Cc:?"James Kass" , unicode at corp.unicode.org Betreff:?Re: Aw: Re: External Link Symbol Yes, and other good reasons not accepted. Please do not take too seriously anything I post with a grinning face emoji. My comment was a variant of a "Never say 'never'!" type joke. On 2024-04-09 15:09, Marius Spix via Unicode wrote: > Unicode already has a link symbol (U+1F517) and there is also an emoji > ZWJ sequence for broken links (so-called red links) or unlink (U+26D3 > U+FE0F U+200D U+1F4A5), which uses the chains symbol (U+26D3) instead > of the link symbol (U+1F517) for some unknown reason. You can also use > the rightwards arrow (U+2192) which is used in encyclopedias for > references. > > Gesendet: Dienstag, 09. April 2024 um 06:44 Uhr > Von: "John Knightley via Unicode" > An: "James Kass" > Cc: unicode at corp.unicode.org > Betreff: Re: External Link Symbol > > This is of course an issue with any proposed character rejected on the > grounds of insufficient usage that the proposal, fonts made, etc and > the > record thereof creates a certain amount of new usage.? > > On 2024-04-09 02:37, James Kass via Unicode wrote: >> On 2024-04-08 5:03 PM, Gabriel Tellez via Unicode wrote: >>> Unicode rejected external link symbol... but then are left to use > the >>> wrong symbol on their own website. SMH >> >> > https://steemit.com/unicode/@markgritter/why-did-unicode-reject-the-external-link-symbol >> >> The rejection reasoning quoted in the link above goes on to say that >> any future proposal for the external link symbol would be considered >> if it includes convincing evidence that the symbol is needed for >> plain-text interchange. From cate at cateee.net Wed Apr 10 09:11:50 2024 From: cate at cateee.net (Giacomo Catenazzi) Date: Wed, 10 Apr 2024 16:11:50 +0200 Subject: Aw: Re: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: <8ffbcd6a-57b7-42f5-8641-f4701305cae9@cateee.net> On 2024-04-10 14:27, Gabriel Tellez via Unicode wrote: > Talk ignoring the point I was trying to make. Unicode?s own website > used the wrong character because they couldn?t use the correct one. You may file a bug. Do not take Unicode website as reference implementation. In past we had uggly Unicode errors on the payment page (to join this very mailing list). BTW: how is it the conversion to the new site (which caused the .1 version instead of a full version)? giacomo -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Wed Apr 10 10:45:52 2024 From: pgcon6 at msn.com (Peter Constable) Date: Wed, 10 Apr 2024 15:45:52 +0000 Subject: Aw: Re: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: Who defines ?right? in this case? I agree the site could use improvement on this point, but the fix I would propose is to use an image asset and not a character. From: Unicode On Behalf Of Gabriel Tellez via Unicode Sent: Wednesday, April 10, 2024 5:28 AM To: jk at koremail.com Cc: James Kass ; Marius Spix ; unicode at corp.unicode.org Subject: Re: Aw: Re: External Link Symbol Talk ignoring the point I was trying to make. Unicode?s own website used the wrong character because they couldn?t use the correct one. On Tue, Apr 9, 2024 at 3:50?AM John Knightley via Unicode > wrote: Yes, and other good reasons not accepted. Please do not take too seriously anything I post with a grinning face emoji. My comment was a variant of a "Never say 'never'!" type joke. On 2024-04-09 15:09, Marius Spix via Unicode wrote: > Unicode already has a link symbol (U+1F517) and there is also an emoji > ZWJ sequence for broken links (so-called red links) or unlink (U+26D3 > U+FE0F U+200D U+1F4A5), which uses the chains symbol (U+26D3) instead > of the link symbol (U+1F517) for some unknown reason. You can also use > the rightwards arrow (U+2192) which is used in encyclopedias for > references. > > Gesendet: Dienstag, 09. April 2024 um 06:44 Uhr > Von: "John Knightley via Unicode" > > An: "James Kass" > > Cc: unicode at corp.unicode.org > Betreff: Re: External Link Symbol > > This is of course an issue with any proposed character rejected on the > grounds of insufficient usage that the proposal, fonts made, etc and > the > record thereof creates a certain amount of new usage.? > > On 2024-04-09 02:37, James Kass via Unicode wrote: >> On 2024-04-08 5:03 PM, Gabriel Tellez via Unicode wrote: >>> Unicode rejected external link symbol... but then are left to use > the >>> wrong symbol on their own website. SMH >> >> > https://steemit.com/unicode/@markgritter/why-did-unicode-reject-the-external-link-symbol >> >> The rejection reasoning quoted in the link above goes on to say that >> any future proposal for the external link symbol would be considered >> if it includes convincing evidence that the symbol is needed for >> plain-text interchange. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Wed Apr 10 12:09:39 2024 From: jameskass at code2001.com (James Kass) Date: Wed, 10 Apr 2024 17:09:39 +0000 Subject: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: On 2024-04-10 3:45 PM, Peter Constable via Unicode wrote: > > Who defines ?right? in this case? > > I agree the site could use improvement on this point, but the fix I > would propose is to use an image asset and not a character. > This can be done in CSS: https://christianoliff.com/blog/styling-external-links-with-an-icon-in-css/ Since Unicode doesn't regard the symbol as text, perhaps any future proposal should treat it as an emoji.? As Marius Spix pointed out, related link symbols already exist in Unicode. From gtbot2007 at gmail.com Wed Apr 10 19:54:09 2024 From: gtbot2007 at gmail.com (Gabriel Tellez) Date: Wed, 10 Apr 2024 20:54:09 -0400 Subject: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: Those are completely unrelated link symbols. On Wed, Apr 10, 2024 at 1:12?PM James Kass via Unicode < unicode at corp.unicode.org> wrote: > > > On 2024-04-10 3:45 PM, Peter Constable via Unicode wrote: > > > > Who defines ?right? in this case? > > > > I agree the site could use improvement on this point, but the fix I > > would propose is to use an image asset and not a character. > > > > This can be done in CSS: > https://christianoliff.com/blog/styling-external-links-with-an-icon-in-css/ > > Since Unicode doesn't regard the symbol as text, perhaps any future > proposal should treat it as an emoji. As Marius Spix pointed out, > related link symbols already exist in Unicode. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Wed Apr 10 21:21:58 2024 From: jameskass at code2001.com (James Kass) Date: Thu, 11 Apr 2024 02:21:58 +0000 Subject: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: <5a680acc-8acf-42f5-b0c1-93e365a9070a@code2001.com> On 2024-04-11 12:54 AM, Gabriel Tellez via Unicode wrote: > Those are completely unrelated link symbols. https://emojipedia.org/link "Two links of a silver chain , positioned at a 45? angle. Used as an icon for a hyperlink on computers and the internet. May also be used for metaphorical connections." Seems related. From doug at ewellic.org Thu Apr 11 00:07:55 2024 From: doug at ewellic.org (Doug Ewell) Date: Thu, 11 Apr 2024 05:07:55 +0000 Subject: Aw: Re: External Link Symbol In-Reply-To: References: <9780b4f931abc2f7b0bf9e7df3ead5c9@koremail.com> <39b50b3fe5e304e2485fcdfb34288a91@koremail.com> Message-ID: Gabriel Tellez wrote: > Talk ignoring the point I was trying to make. Unicode?s own website > used the wrong character because they couldn?t use the correct one. They?re not using a ?character? at all. They?re using a glyph from one of those icon fonts, which map all kinds of graphics that aren?t necessarily suitable for encoding as characters (such as the old Twitter logo) to code points in the PUA. That?s standard practice for many web sites, to make handling of small icons easier. Unicode isn?t a standard for encoding symbols per se; there are many ISO standards that fill that role. Unicode is a standard for encoding characters that appear in plain text, many of which happen to be symbols. A symbol that appears on an HTML web page is not necessarily one that appears in plain text; remember what the ?ML? in HTML stands for. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From christoph.paeper at crissov.de Thu Apr 11 10:12:13 2024 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 11 Apr 2024 17:12:13 +0200 Subject: External Link Symbol In-Reply-To: <5a680acc-8acf-42f5-b0c1-93e365a9070a@code2001.com> References: <5a680acc-8acf-42f5-b0c1-93e365a9070a@code2001.com> Message-ID: <2F9BEC23-302A-4773-BD23-7C41B1C3191E@crissov.de> James Kass via Unicode : > > ?https://emojipedia.org/link > > "Two links of a silver chain , positioned at a 45? angle. Used as an icon for a hyperlink on computers and the internet. May also be used for metaphorical connections." > > Seems related. Only remotely. These emoji ??? resemble text editor GUI icons for inserting a hyperlink, but this thread is about a symbol that is used before or after the link text in web pages to indicate that its target is ?off site? ? whatever that means exactly. From asmusf at ix.netcom.com Thu Apr 11 10:47:11 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 11 Apr 2024 08:47:11 -0700 Subject: External Link Symbol In-Reply-To: <2F9BEC23-302A-4773-BD23-7C41B1C3191E@crissov.de> References: <5a680acc-8acf-42f5-b0c1-93e365a9070a@code2001.com> <2F9BEC23-302A-4773-BD23-7C41B1C3191E@crissov.de> Message-ID: On 4/11/2024 8:12 AM, Christoph P?per via Unicode wrote: > James Kass via Unicode: >> ?https://emojipedia.org/link >> >> "Two links of a silver chain, positioned at a 45? angle. Used as an icon for a hyperlink on computers and the internet. May also be used for metaphorical connections." >> >> Seems related. > Only remotely. These emoji ??? resemble text editor GUI icons for inserting a hyperlink, but this thread is about a symbol that is used before or after the link text in web pages to indicate that its target is ?off site? ? whatever that means exactly. > There are several related questions that should not be mixed up: * Are two shapes visually similar (whether or not they express the same concept/semantics) ? * Are two concepts related (even if not the same, are they from the same field of application) ? If the answer to the latter is YES, then there is an argument to be made for parallel treatment in encoding. The question on visual similarity is more complicated, and conflates several distinct scenarios. The four main cases are: 1. The shapes are accidentally similar, but the concept are distinct 2. The shapes are similar and valid glyph alternatives for the same concept 3. The shapes are distinct, but reflect different notations choice to represent the same concept 4. The shapes and concepts are unrelated. We agree that the diagonal chain link emoji is used in GUIs and other places to indicate a link or an action to add / edit a link. We also agree that link and external link are both related to URLs, but the concept of "external link" cannot be conflated with the generic "link". Given that "link" is now available as encoded character, I don't feel the warm and fuzzies about a principled stand to restrict the "external link" to an external image. There's nothing inherent in the distinction that absolutely must be reflected in a disparate decision on encoding for these two. In other words, it strikes me as silly. If it had been added when first proposed, we'd probably see widespread adoption by now. That said, it's easy enough to realize with a site-wide image. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sosipiuk at gmail.com Thu Apr 11 11:28:25 2024 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Thu, 11 Apr 2024 16:28:25 +0000 Subject: External Link Symbol In-Reply-To: References: Message-ID: <1712852286169.3195214627.3679976055@gmail.com> There are actually three kinds of links that are distinguishable from each other: - A link to a different location in the current document (anchor link/jump link) - A link to a resource on the same network/domain as the current document (local link/relative link) - A link to a resource on a different network (external link) All those can appear as symbols, used contrastively, within a run of text. I'm very surprised these haven't already been encoded and that there is any controversy. The consortium doesn't care much for precendent, but come on, we have "play"and "eject" symbols encoded! From tom.moore at microsoft.com Thu Apr 11 13:47:42 2024 From: tom.moore at microsoft.com (Tom Moore) Date: Thu, 11 Apr 2024 18:47:42 +0000 Subject: [EXTERNAL] Re: External Link Symbol In-Reply-To: <1712852286169.3195214627.3679976055@gmail.com> References: <1712852286169.3195214627.3679976055@gmail.com> Message-ID: Then multiply that by 2, for links that navigate current tab vs. request to open a new tab. -----Original Message----- From: Unicode On Behalf Of Slawomir Osipiuk via Unicode Sent: Thursday, April 11, 2024 9:28 AM To: asmusf ; Asmus Freytag via Unicode Subject: [EXTERNAL] Re: External Link Symbol There are actually three kinds of links that are distinguishable from each other: - A link to a different location in the current document (anchor link/jump link) - A link to a resource on the same network/domain as the current document (local link/relative link) - A link to a resource on a different network (external link) All those can appear as symbols, used contrastively, within a run of text. I'm very surprised these haven't already been encoded and that there is any controversy. The consortium doesn't care much for precendent, but come on, we have "play"and "eject" symbols encoded! From asmusf at ix.netcom.com Thu Apr 11 14:05:55 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 11 Apr 2024 12:05:55 -0700 Subject: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> Message-ID: On 4/11/2024 11:47 AM, Tom Moore wrote: > Then multiply that by 2, for links that navigate current tab vs. request to open a new tab. Is there a link to samples for all of these as used in practice, or is this just a theoretical distinction? A./ > > -----Original Message----- > From: Unicode On Behalf Of Slawomir Osipiuk via Unicode > Sent: Thursday, April 11, 2024 9:28 AM > To: asmusf ; Asmus Freytag via Unicode > Subject: [EXTERNAL] Re: External Link Symbol > > There are actually three kinds of links that are distinguishable from each > other: > > - A link to a different location in the current document (anchor link/jump > link) > - A link to a resource on the same network/domain as the current document (local link/relative link) > - A link to a resource on a different network (external link) > > All those can appear as symbols, used contrastively, within a run of text. > I'm very surprised these haven't already been encoded and that there is any controversy. The consortium doesn't care much for precendent, but come on, we have "play"and "eject" symbols encoded! From mark at kli.org Thu Apr 11 16:27:29 2024 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 11 Apr 2024 17:27:29 -0400 Subject: External Link Symbol In-Reply-To: References: <5a680acc-8acf-42f5-b0c1-93e365a9070a@code2001.com> <2F9BEC23-302A-4773-BD23-7C41B1C3191E@crissov.de> Message-ID: On 4/11/24 11:47, Asmus Freytag via Unicode wrote: > > Given that "link" is now available as encoded character, I don't feel > the warm and fuzzies about a principled stand to restrict the > "external link" to an external image. There's nothing inherent in the > distinction that absolutely must be reflected in a disparate decision > on encoding for these two. > > In other words, it strikes me as silly. If it had been added when > first proposed, we'd probably see widespread adoption by now. That > said, it's easy enough to realize with a site-wide image. > The external link character almost seems like a no-brainer for me.? Once Wikipedia started using an image, it became extremely well-known and recognizable and started popping up all over the place, usually the exact same image or something very similar.? While it's true that HTML and links are almost definitionally not "plain text" (it's a link), that line has never been really bright (you can tell because we argue about it all the time here.)? WP's external link symbol is way closer to plain text and has far more usage than most of the map symbols we've encoded, and probably more than emoji as well.? As Asmus said, if it had been added when proposed, we'd be seeing widespread usage, and indeed we're seeing widespread usage even though it wasn't added, with various SVG images etc. That's how you manufacture usage to justify encoding.? To me, this is one of the most encoding-worthy symbols I've seen out there, and I'm astonished it still isn't encoded.? But that's just me. ~mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Fri Apr 12 02:31:22 2024 From: marius.spix at web.de (Marius Spix) Date: Fri, 12 Apr 2024 09:31:22 +0200 Subject: Aw: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> Message-ID: An HTML attachment was scrubbed... URL: From mark at kli.org Fri Apr 12 07:26:56 2024 From: mark at kli.org (Mark E. Shoulson) Date: Fri, 12 Apr 2024 08:26:56 -0400 Subject: Aw: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> Message-ID: <525cd318-58dd-4223-8ec1-9c2be3a96a46@kli.org> On 4/12/24 03:31, Marius Spix via Unicode wrote: > For all these types of links existing characters can be used: > anchor links: U+00B6???PILCROW SIGN > local links: U+1F517?? LINK SYMBOL > broken links (also known as?red-links):?U+26D3?U+200D U+1F4A5 CHAINS > +??ZERO WIDTH JOINER +?COLLISION SYMBOL > external links:?U+2192 ? RIGHTWARDS ARROW Good suggestions.? There's "can be used", though, and there's "are being used."? I've certainly seen the PILCROW SIGN used for anchor links, though generally only at the "anchor" end, not at the link end.? Many web pages have the pilcrow sign appearing on hover-over on headers which act as anchors.? And not everyplace uses the Wikipedia arrow-and-box symbol for external links, I think I've seen things like RIGHTWARDS ARROW or other arrows used.? But lots of places use the Wikipedia-style arrow-and-box. Saying, "well, you could use something else" is sort of like saying "we don't need to encode Devanagari, you can just transliterate into Latin, it says the same thing." ~mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Fri Apr 12 07:37:15 2024 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Fri, 12 Apr 2024 14:37:15 +0200 Subject: External Link Symbol In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Fri Apr 12 11:46:07 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 12 Apr 2024 09:46:07 -0700 Subject: Aw: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> Message-ID: <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> The first and last choice are arguably not the most conventional representations for these. They are, at best, fallbacks. A./ On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: > For all these types of links existing characters can be used: > anchor links: U+00B6???PILCROW SIGN > local links: U+1F517?? LINK SYMBOL > broken links (also known as?red-links):?U+26D3?U+200D U+1F4A5 CHAINS > +??ZERO WIDTH JOINER +?COLLISION SYMBOL > external links:?U+2192 ? RIGHTWARDS ARROW > *Gesendet:*?Donnerstag, 11. April 2024 um 21:05 Uhr > *Von:*?"Asmus Freytag via Unicode" > *An:*?"Tom Moore" , "S?awomir Osipiuk" > , "Asmus Freytag via Unicode" > > *Betreff:*?Re: [EXTERNAL] Re: External Link Symbol > On 4/11/2024 11:47 AM, Tom Moore wrote: > > Then multiply that by 2, for links that navigate current tab vs. > request to open a new tab. > > Is there a link to samples for all of these as used in practice, or is > this just a theoretical distinction? > > A./ > > > > > -----Original Message----- > > From: Unicode On Behalf Of > Slawomir Osipiuk via Unicode > > Sent: Thursday, April 11, 2024 9:28 AM > > To: asmusf ; Asmus Freytag via Unicode > > > Subject: [EXTERNAL] Re: External Link Symbol > > > > There are actually three kinds of links that are distinguishable > from each > > other: > > > > - A link to a different location in the current document (anchor > link/jump > > link) > > - A link to a resource on the same network/domain as the current > document (local link/relative link) > > - A link to a resource on a different network (external link) > > > > All those can appear as symbols, used contrastively, within a run of > text. > > I'm very surprised these haven't already been encoded and that there > is any controversy. The consortium doesn't care much for precendent, > but come on, we have "play"and "eject" symbols encoded! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Fri Apr 12 12:59:18 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 12 Apr 2024 10:59:18 -0700 Subject: Aw: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: <525cd318-58dd-4223-8ec1-9c2be3a96a46@kli.org> References: <1712852286169.3195214627.3679976055@gmail.com> <525cd318-58dd-4223-8ec1-9c2be3a96a46@kli.org> Message-ID: <61e59f28-a757-491a-8f7a-4f8a980e5d8a@ix.netcom.com> On 4/12/2024 5:26 AM, Mark E. Shoulson via Unicode wrote: > On 4/12/24 03:31, Marius Spix via Unicode wrote: >> For all these types of links existing characters can be used: >> anchor links: U+00B6???PILCROW SIGN >> local links: U+1F517?? LINK SYMBOL >> broken links (also known as?red-links):?U+26D3?U+200D U+1F4A5 CHAINS >> +??ZERO WIDTH JOINER +?COLLISION SYMBOL >> external links:?U+2192 ? RIGHTWARDS ARROW > > Good suggestions.? There's "can be used", though, and there's "are > being used."? I've certainly seen the PILCROW SIGN used for anchor > links, though generally only at the "anchor" end, not at the link > end.? Many web pages have the pilcrow sign appearing on hover-over on > headers which act as anchors.? And not everyplace uses the Wikipedia > arrow-and-box symbol for external links, I think I've seen things like > RIGHTWARDS ARROW or other arrows used.? But lots of places use the > Wikipedia-style arrow-and-box.? Saying, "well, you could use something > else" is sort of like saying "we don't need to encode Devanagari, you > can just transliterate into Latin, it says the same thing." > > ~mark > "Can be used" is not the standard we should apply. There may be legitimate alternate representations for the same concept, but that doesn't get us out of recognizing the case like this where there's a clear favorite in wide-spread use as symbol. I get it when Unicode is hesitant about encoding just "any" symbol, such as traffic signs, because signage and text are distinct use cases. But in this case, the rationale for not encoding this is very thin - particularly because a lot of parallel cases are clearly available as characters.. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Apr 12 23:36:18 2024 From: doug at ewellic.org (Doug Ewell) Date: Sat, 13 Apr 2024 04:36:18 +0000 Subject: Aw: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> Message-ID: >> For all these types of links existing characters can be used: >> >> anchor links: U+00B6 ? PILCROW SIGN >> local links: U+1F517 ? LINK SYMBOL >> broken links (also known as red-links): U+26D3 U+200D U+1F4A5 CHAINS >> + ZERO WIDTH JOINER + COLLISION SYMBOL >> external links: U+2192 ? RIGHTWARDS ARROW > > The first and last choice are arguably not the most conventional > representations for these. They are, at best, fallbacks. The irony of all this is that the OP?s argument was that ?Unicode? (actually, whatever company or designer Unicode, Inc. hired to do web design) was ?left to use the wrong symbol on their own website? because external-link isn?t available as a character... ... yet the symbol they chose to implement as a graphic is just a slightly repositioned variant of U+2197 NORTH EAST ARROW, which has been encoded in Unicode literally since version 1.0. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From marius.spix at web.de Mon Apr 15 08:55:04 2024 From: marius.spix at web.de (Marius Spix) Date: Mon, 15 Apr 2024 15:55:04 +0200 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> Message-ID: An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Mon Apr 15 09:35:26 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 15 Apr 2024 07:35:26 -0700 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> Message-ID: <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> On 4/15/2024 6:55 AM, Marius Spix wrote: > The pilcrow?sign is offically mentioned in RFC 7992. See section 5.2. > So I would consider it the conventional representation for anchor links. I would agree that it is "a convention" for representation of anchor links. It happens to work for English, as the pilcrow sign conventionally means "paragraph" and the intent in RFC7992 is to provide links to all paragraphs. However, the formatting of RFCs provided as HTML is a different beast from generic prescription for formatting all HTML documents. So this should not be over interpreted. A./ > *Gesendet:*?Freitag, 12. April 2024 um 18:46 Uhr > *Von:*?"Asmus Freytag via Unicode" > *An:* unicode at corp.unicode.org > *Betreff:*?Re: Aw: Re: [EXTERNAL] Re: External Link Symbol > The first and last choice are arguably not the most conventional > representations for these. They are, at best, fallbacks. > A./ > On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: > > For all these types of links existing characters can be used: > anchor links: U+00B6???PILCROW SIGN > local links: U+1F517?? LINK SYMBOL > broken links (also known as?red-links):?U+26D3?U+200D U+1F4A5 > CHAINS +??ZERO WIDTH JOINER +?COLLISION SYMBOL > external links:?U+2192 ? RIGHTWARDS ARROW > *Gesendet:*?Donnerstag, 11. April 2024 um 21:05 Uhr > *Von:*?"Asmus Freytag via Unicode" > *An:*?"Tom Moore" , "S?awomir Osipiuk" > , "Asmus Freytag via Unicode" > > *Betreff:*?Re: [EXTERNAL] Re: External Link Symbol > On 4/11/2024 11:47 AM, Tom Moore wrote: > > Then multiply that by 2, for links that navigate current tab vs. > request to open a new tab. > > Is there a link to samples for all of these as used in practice, or is > this just a theoretical distinction? > > A./ > > > > > -----Original Message----- > > From: Unicode On Behalf Of > Slawomir Osipiuk via Unicode > > Sent: Thursday, April 11, 2024 9:28 AM > > To: asmusf ; Asmus Freytag via Unicode > > > Subject: [EXTERNAL] Re: External Link Symbol > > > > There are actually three kinds of links that are distinguishable > from each > > other: > > > > - A link to a different location in the current document (anchor > link/jump > > link) > > - A link to a resource on the same network/domain as the current > document (local link/relative link) > > - A link to a resource on a different network (external link) > > > > All those can appear as symbols, used contrastively, within a > run of text. > > I'm very surprised these haven't already been encoded and that > there is any controversy. The consortium doesn't care much for > precendent, but come on, we have "play"and "eject" symbols encoded! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From don.hosek at gmail.com Wed Apr 17 13:46:47 2024 From: don.hosek at gmail.com (Don Hosek) Date: Wed, 17 Apr 2024 14:46:47 -0400 Subject: Questions about Indic Conjuct Clusters Message-ID: It?s not immediately clear from the specification what the correct implementation would be for a few pathological cases of the Indic Conjuct Cluster specification in the Unicode 15.1.0 specification. For convenience?s sake, let?s use the following shorthand: C = \p{InCB=Consonant} E = \p{InCB=Extend} L = \p{InCB=Linker} M = \p{M} 1. It appears that both E and L are subsets of M and I think E?L = M . Is this correct? If so, is GB9c equivalent to saying that CM+C should be considered a single cluster iff that sequence of characters M+ contains at least one character from L? (Having written this question and looking at the statement of the rule from https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.html, my restatement seems to correspond to 9.3 in that list). 2. Should a sequence like, e.g., CLCLC be considered a single cluster or would it be two clusters, CLCL ? C? I would note also that the chart at https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.html seems to be not quite correct. -dh -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Wed Apr 17 19:32:22 2024 From: pgcon6 at msn.com (Peter Constable) Date: Thu, 18 Apr 2024 00:32:22 +0000 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> Message-ID: Let?s be clear: all that RFC 7992 is doing is documenting the conventions used in the non-canonical HTML versions of IETF RFCs. Unless in some other context there is a specification that normatively references RFC 7992, it has no real import beyond the HTML versions of IETF RFCs. Peter From: Unicode On Behalf Of Asmus Freytag via Unicode Sent: Monday, April 15, 2024 7:35 AM To: Marius Spix Cc: unicode at corp.unicode.org Subject: Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol On 4/15/2024 6:55 AM, Marius Spix wrote: The pilcrow sign is offically mentioned in RFC 7992. See section 5.2. So I would consider it the conventional representation for anchor links. I would agree that it is "a convention" for representation of anchor links. It happens to work for English, as the pilcrow sign conventionally means "paragraph" and the intent in RFC7992 is to provide links to all paragraphs. However, the formatting of RFCs provided as HTML is a different beast from generic prescription for formatting all HTML documents. So this should not be over interpreted. A./ Gesendet: Freitag, 12. April 2024 um 18:46 Uhr Von: "Asmus Freytag via Unicode" An: unicode at corp.unicode.org Betreff: Re: Aw: Re: [EXTERNAL] Re: External Link Symbol The first and last choice are arguably not the most conventional representations for these. They are, at best, fallbacks. A./ On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: For all these types of links existing characters can be used: anchor links: U+00B6 ? PILCROW SIGN local links: U+1F517 ? LINK SYMBOL broken links (also known as red-links): U+26D3 U+200D U+1F4A5 CHAINS + ?ZERO WIDTH JOINER + COLLISION SYMBOL external links: U+2192 ? RIGHTWARDS ARROW Gesendet: Donnerstag, 11. April 2024 um 21:05 Uhr Von: "Asmus Freytag via Unicode" An: "Tom Moore" , "S?awomir Osipiuk" , "Asmus Freytag via Unicode" Betreff: Re: [EXTERNAL] Re: External Link Symbol On 4/11/2024 11:47 AM, Tom Moore wrote: > Then multiply that by 2, for links that navigate current tab vs. request to open a new tab. Is there a link to samples for all of these as used in practice, or is this just a theoretical distinction? A./ > > -----Original Message----- > From: Unicode On Behalf Of Slawomir Osipiuk via Unicode > Sent: Thursday, April 11, 2024 9:28 AM > To: asmusf ; Asmus Freytag via Unicode > Subject: [EXTERNAL] Re: External Link Symbol > > There are actually three kinds of links that are distinguishable from each > other: > > - A link to a different location in the current document (anchor link/jump > link) > - A link to a resource on the same network/domain as the current document (local link/relative link) > - A link to a resource on a different network (external link) > > All those can appear as symbols, used contrastively, within a run of text. > I'm very surprised these haven't already been encoded and that there is any controversy. The consortium doesn't care much for precendent, but come on, we have "play"and "eject" symbols encoded! -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 17 19:56:02 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 17 Apr 2024 17:56:02 -0700 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> Message-ID: This spells out what I tried to imply by warning against overinterpreting this example. However, whether something is a normative specification, and the limits of its normative scope is always only one aspect. Another aspect is whether it represents a record of somebody's explicit convention for what to do for a given feature, such as "anchor" links in the current example. Having a written convention that documents intent is preferable over relying on mere observation, for example, noticing that certain documents or certain platforms just happen to behave in a certain way. But, on it's own, just because something is written down is certainly not enough to suggest that the convention is common, let alone universal. From Unicode's perspective, a convention does not have to be normative for general purpose documents to be taken into account in making informed encoding decisions. The degree to which it is followed in practice (both in its original domain as well as in analogous cases) is usually more important. If I were a submitter, I would treat something like the citation of RFC7992 then not as something that "settles" an encoding question, but one that calls for further research to see whether that particular convention is found in (enough) other places to help reach a decision. Alternatively, it might serve as a data point for the conclusion that there's no single convention (with further research needed to find out whether this represents a case of a small number of alternate coexisting conventions, or the case of something where the real world hasn't settled on anything). A./ On 4/17/2024 5:32 PM, Peter Constable wrote: > > Let?s be clear: all that RFC 7992 is doing is documenting the > conventions used in the non-canonical HTML versions of IETF RFCs. > Unless in some other context there is a specification that normatively > references RFC 7992, it has no real import beyond the HTML versions of > IETF RFCs. > > Peter > > *From:*Unicode *On Behalf Of *Asmus > Freytag via Unicode > *Sent:* Monday, April 15, 2024 7:35 AM > *To:* Marius Spix > *Cc:* unicode at corp.unicode.org > *Subject:* Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol > > On 4/15/2024 6:55 AM, Marius Spix wrote: > > The pilcrow?sign is offically mentioned in RFC 7992. See section > 5.2. So I would consider it the conventional representation for > anchor links. > > I would agree that it is "a convention" for representation of anchor > links. It happens to work for English, as the pilcrow sign > conventionally means "paragraph" and the intent in RFC7992 is to > provide links to all paragraphs. > > However, the formatting of RFCs provided as HTML is a different beast > from generic prescription for formatting all HTML documents. So this > should not be over interpreted. > > A./ > > *Gesendet:*?Freitag, 12. April 2024 um 18:46 Uhr > *Von:*?"Asmus Freytag via Unicode" > > *An:* unicode at corp.unicode.org > *Betreff:*?Re: Aw: Re: [EXTERNAL] Re: External Link Symbol > > The first and last choice are arguably not the most conventional > representations for these. They are, at best, fallbacks. > > A./ > > On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: > > For all these types of links existing characters can be used: > > anchor links: U+00B6???PILCROW SIGN > > local links: U+1F517 ?LINK SYMBOL > > broken links (also known as?red-links):?U+26D3?U+200D U+1F4A5 > CHAINS +??ZERO WIDTH JOINER +?COLLISION SYMBOL > > external links:?U+2192 ?RIGHTWARDS ARROW > > *Gesendet:*?Donnerstag, 11. April 2024 um 21:05 Uhr > *Von:*?"Asmus Freytag via Unicode" > > *An:*?"Tom Moore" > , "S?awomir Osipiuk" > , "Asmus > Freytag via Unicode" > > *Betreff:*?Re: [EXTERNAL] Re: External Link Symbol > > On 4/11/2024 11:47 AM, Tom Moore wrote: > > Then multiply that by 2, for links that navigate current tab > vs. request to open a new tab. > > Is there a link to samples for all of these as used in > practice, or is > this just a theoretical distinction? > > A./ > > > > > -----Original Message----- > > From: Unicode > On Behalf Of > Slawomir Osipiuk via Unicode > > Sent: Thursday, April 11, 2024 9:28 AM > > To: asmusf > ; Asmus Freytag via Unicode > > > Subject: [EXTERNAL] Re: External Link Symbol > > > > There are actually three kinds of links that are > distinguishable from each > > other: > > > > - A link to a different location in the current document > (anchor link/jump > > link) > > - A link to a resource on the same network/domain as the > current document (local link/relative link) > > - A link to a resource on a different network (external link) > > > > All those can appear as symbols, used contrastively, within > a run of text. > > I'm very surprised these haven't already been encoded and > that there is any controversy. The consortium doesn't care > much for precendent, but come on, we have "play"and "eject" > symbols encoded! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at kli.org Wed Apr 17 20:20:01 2024 From: mark at kli.org (Mark E. Shoulson) Date: Wed, 17 Apr 2024 21:20:01 -0400 Subject: HEBREW HE-WITH-ADNY-INSIDE In-Reply-To: References: Message-ID: <1b2771dd-39b7-4f59-a9f2-4a81bacec565@kli.org> Wow, not a peep about this?? Surely a group this opinionated would have something to say.? I guess I should propose this, since it's in use?? Probably would have a compatibility equivalence to just plain HEBREW LETTER HE. ~mark On 4/1/24 17:39, Mark E. Shoulson via Unicode wrote: > Looking waaaay back to my opus (with Michael Everson) of 1998, > http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1740/n1740.htm, I call to > attention one particular case mentioned there: the case where the > second HEBREW LETTER HE of the Tetragrammaton is made very wide and > another Holy Name (Adonay, ALEF-DALET-NUN-YOD) is printed in smaller > letters inside it.? As mentioned last century, this is even now (well, > then) commonly met with, especially in Sephardic prayer books. > > I mention it because I've found a bunch of professional Hebrew fonts > which have a glyph for this special character.? Take a look at any one > of many (but not all) of the offerings of the Samtype Foundry at > https://www.myfonts.com/collections/samtype-foundry and you'll see > what I mean.? Sometimes it's visible in the sample image, sometimes it > isn't even though it's in the font.? They seem to be placing the glyph > at codepoint U+FB50, which is ARABIC LETTER ALEF WASLA ISOLATED FORM, > probably because it's the next character after the extended Hebrew > code-block that ends at U+FB4F HEBREW LIGATURE ALEF LAMED and because, > being in an Arabic codeblock, it has RTL directionality (while the PUA > I think has LTR directionality, which is most inconvenient.) > > So it seems that this really is a thing being used by typefounders > even now.? Probably should be encoded, yes?? My rationale from 1998 of > encoding the Tetragrammaton as a glyph in itself was apparently not > accepted, though after a later paper, > https://unicode.org/L2/L2015/15092-hebew-nomina-sacra.pdf and some > discussion, the YOD TRIANGLE U+05EF was encoded.? Perhaps this should > be too?? I guess as a variant of HE perhaps?? (the name in the > subject-header is not meant as a serious proposal for the glyph-name, > though this letter is actually serious, despite the date.) > > ~mark From pgcon6 at msn.com Thu Apr 18 12:13:23 2024 From: pgcon6 at msn.com (Peter Constable) Date: Thu, 18 Apr 2024 17:13:23 +0000 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> Message-ID: > If I were a submitter, I would treat something like the citation of RFC7992 then not as something that "settles" an encoding question, but one that calls for further research to see whether that particular convention is found in (enough) other places to help reach a decision. Completely agree with that. However, there is another factor in this that UTC will consider and _has_ considered: is the symbol used in public text data interchange. Wrt external link symbol, UTC has previously decided that this falls into the general class of symbols used in app iconography and that that evidence based on such usage is not sufficient for encoding as a textual character. Peter From: Asmus Freytag Sent: Wednesday, April 17, 2024 5:56 PM To: Peter Constable ; Marius Spix Cc: unicode at corp.unicode.org Subject: Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol This spells out what I tried to imply by warning against overinterpreting this example. However, whether something is a normative specification, and the limits of its normative scope is always only one aspect. Another aspect is whether it represents a record of somebody's explicit convention for what to do for a given feature, such as "anchor" links in the current example. Having a written convention that documents intent is preferable over relying on mere observation, for example, noticing that certain documents or certain platforms just happen to behave in a certain way. But, on it's own, just because something is written down is certainly not enough to suggest that the convention is common, let alone universal. From Unicode's perspective, a convention does not have to be normative for general purpose documents to be taken into account in making informed encoding decisions. The degree to which it is followed in practice (both in its original domain as well as in analogous cases) is usually more important. If I were a submitter, I would treat something like the citation of RFC7992 then not as something that "settles" an encoding question, but one that calls for further research to see whether that particular convention is found in (enough) other places to help reach a decision. Alternatively, it might serve as a data point for the conclusion that there's no single convention (with further research needed to find out whether this represents a case of a small number of alternate coexisting conventions, or the case of something where the real world hasn't settled on anything). A./ On 4/17/2024 5:32 PM, Peter Constable wrote: Let?s be clear: all that RFC 7992 is doing is documenting the conventions used in the non-canonical HTML versions of IETF RFCs. Unless in some other context there is a specification that normatively references RFC 7992, it has no real import beyond the HTML versions of IETF RFCs. Peter From: Unicode On Behalf Of Asmus Freytag via Unicode Sent: Monday, April 15, 2024 7:35 AM To: Marius Spix Cc: unicode at corp.unicode.org Subject: Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol On 4/15/2024 6:55 AM, Marius Spix wrote: The pilcrow sign is offically mentioned in RFC 7992. See section 5.2. So I would consider it the conventional representation for anchor links. I would agree that it is "a convention" for representation of anchor links. It happens to work for English, as the pilcrow sign conventionally means "paragraph" and the intent in RFC7992 is to provide links to all paragraphs. However, the formatting of RFCs provided as HTML is a different beast from generic prescription for formatting all HTML documents. So this should not be over interpreted. A./ Gesendet: Freitag, 12. April 2024 um 18:46 Uhr Von: "Asmus Freytag via Unicode" An: unicode at corp.unicode.org Betreff: Re: Aw: Re: [EXTERNAL] Re: External Link Symbol The first and last choice are arguably not the most conventional representations for these. They are, at best, fallbacks. A./ On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: For all these types of links existing characters can be used: anchor links: U+00B6 ? PILCROW SIGN local links: U+1F517 ? LINK SYMBOL broken links (also known as red-links): U+26D3 U+200D U+1F4A5 CHAINS + ?ZERO WIDTH JOINER + COLLISION SYMBOL external links: U+2192 ? RIGHTWARDS ARROW Gesendet: Donnerstag, 11. April 2024 um 21:05 Uhr Von: "Asmus Freytag via Unicode" An: "Tom Moore" , "S?awomir Osipiuk" , "Asmus Freytag via Unicode" Betreff: Re: [EXTERNAL] Re: External Link Symbol On 4/11/2024 11:47 AM, Tom Moore wrote: > Then multiply that by 2, for links that navigate current tab vs. request to open a new tab. Is there a link to samples for all of these as used in practice, or is this just a theoretical distinction? A./ > > -----Original Message----- > From: Unicode On Behalf Of Slawomir Osipiuk via Unicode > Sent: Thursday, April 11, 2024 9:28 AM > To: asmusf ; Asmus Freytag via Unicode > Subject: [EXTERNAL] Re: External Link Symbol > > There are actually three kinds of links that are distinguishable from each > other: > > - A link to a different location in the current document (anchor link/jump > link) > - A link to a resource on the same network/domain as the current document (local link/relative link) > - A link to a resource on a different network (external link) > > All those can appear as symbols, used contrastively, within a run of text. > I'm very surprised these haven't already been encoded and that there is any controversy. The consortium doesn't care much for precendent, but come on, we have "play"and "eject" symbols encoded! -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Apr 18 12:31:40 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 18 Apr 2024 10:31:40 -0700 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> Message-ID: <90017c84-43c8-416f-a7fb-99e3463e4e63@ix.netcom.com> In this context it is interesting that RFC7992 suggests use of a character code and, incidentally, makes the case that there are use-cases where reliance on external images is explicitly ruled out. This underscores the "chicken-and-egg" problem we so often encounter in character encoding. Unless a character is encoded, it can't be used in a text-only environment. In this particular case, my reading would be that there's nothing inherently favoring an image-only solution. It just happens that this was the only way to go in early implementations, but as RFC7992 reminds us, there are use cases where that is not an option. As a result, as I wrote in an earlier part of this thread, I can't get the warm and fuzzies about applying the "app iconography" principle here. Particularly not for something that always appears in connection with and part of a text block. There are many other examples that are much more removed from text and to which that principle is more properly applicable. A./ On 4/18/2024 10:13 AM, Peter Constable wrote: > > > If I were a submitter, I would treat something like the citation of > RFC7992 then not as something that "settles" an encoding question, but > one that calls for further research to see whether that particular > convention is found in (enough) other places to help reach a decision. > > Completely agree with that. > > However, there is another factor in this that UTC will consider and > _/has/_ considered: is the symbol used in public text data > interchange. Wrt external link symbol, UTC has previously decided that > this falls into the general class of symbols used in app iconography > and that that evidence based on such usage is not sufficient for > encoding as a textual character. > > Peter > > *From:*Asmus Freytag > *Sent:* Wednesday, April 17, 2024 5:56 PM > *To:* Peter Constable ; Marius Spix > *Cc:* unicode at corp.unicode.org > *Subject:* Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol > > This spells out what I tried to imply by warning against > overinterpreting this example. However, whether something is a > normative specification, and the limits of its normative scope is > always only one aspect. Another aspect is whether it represents a > record of somebody's explicit convention for what to do for a given > feature, such as "anchor" links in the current example. > > Having a written convention that documents intent is preferable over > relying on mere observation, for example, noticing that certain > documents or certain platforms just happen to behave in a certain way. > But, on it's own, just because something is written down is certainly > not enough to suggest that the convention is common, let alone universal. > > From Unicode's perspective, a convention does not have to be normative > for general purpose documents to be taken into account in making > informed encoding decisions. The degree to which it is followed in > practice (both in its original domain as well as in analogous cases) > is usually more important. > > If I were a submitter, I would treat something like the citation of > RFC7992 then not as something that "settles" an encoding question, but > one that calls for further research to see whether that particular > convention is found in (enough) other places to help reach a decision. > Alternatively, it might serve as a data point for the conclusion that > there's no single convention (with further research needed to find out > whether this represents a case of a small number of alternate > coexisting conventions, or the case of something where the real world > hasn't settled on anything). > > A./ > > On 4/17/2024 5:32 PM, Peter Constable wrote: > > Let?s be clear: all that RFC 7992 is doing is documenting the > conventions used in the non-canonical HTML versions of IETF RFCs. > Unless in some other context there is a specification that > normatively references RFC 7992, it has no real import beyond the > HTML versions of IETF RFCs. > > Peter > > *From:*Unicode > *On Behalf Of *Asmus > Freytag via Unicode > *Sent:* Monday, April 15, 2024 7:35 AM > *To:* Marius Spix > *Cc:* unicode at corp.unicode.org > *Subject:* Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol > > On 4/15/2024 6:55 AM, Marius Spix wrote: > > The pilcrow?sign is offically mentioned in RFC 7992. See > section 5.2. So I would consider it the conventional > representation for anchor links. > > I would agree that it is "a convention" for representation of > anchor links. It happens to work for English, as the pilcrow sign > conventionally means "paragraph" and the intent in RFC7992 is to > provide links to all paragraphs. > > However, the formatting of RFCs provided as HTML is a different > beast from generic prescription for formatting all HTML documents. > So this should not be over interpreted. > > A./ > > *Gesendet:*?Freitag, 12. April 2024 um 18:46 Uhr > *Von:*?"Asmus Freytag via Unicode" > > *An:* unicode at corp.unicode.org > *Betreff:*?Re: Aw: Re: [EXTERNAL] Re: External Link Symbol > > The first and last choice are arguably not the most > conventional representations for these. They are, at best, > fallbacks. > > A./ > > On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: > > For all these types of links existing characters can be used: > > anchor links: U+00B6???PILCROW SIGN > > local links: U+1F517 ?LINK SYMBOL > > broken links (also known as?red-links):?U+26D3?U+200D > U+1F4A5 CHAINS +??ZERO WIDTH JOINER +?COLLISION SYMBOL > > external links:?U+2192 ?RIGHTWARDS ARROW > > *Gesendet:*?Donnerstag, 11. April 2024 um 21:05 Uhr > *Von:*?"Asmus Freytag via Unicode" > > *An:*?"Tom Moore" > , "S?awomir Osipiuk" > , "Asmus > Freytag via Unicode" > > *Betreff:*?Re: [EXTERNAL] Re: External Link Symbol > > On 4/11/2024 11:47 AM, Tom Moore wrote: > > Then multiply that by 2, for links that navigate current > tab vs. request to open a new tab. > > Is there a link to samples for all of these as used in > practice, or is > this just a theoretical distinction? > > A./ > > > > > -----Original Message----- > > From: Unicode > On Behalf Of > Slawomir Osipiuk via Unicode > > Sent: Thursday, April 11, 2024 9:28 AM > > To: asmusf > ; Asmus Freytag via Unicode > > > Subject: [EXTERNAL] Re: External Link Symbol > > > > There are actually three kinds of links that are > distinguishable from each > > other: > > > > - A link to a different location in the current document > (anchor link/jump > > link) > > - A link to a resource on the same network/domain as the > current document (local link/relative link) > > - A link to a resource on a different network (external > link) > > > > All those can appear as symbols, used contrastively, > within a run of text. > > I'm very surprised these haven't already been encoded > and that there is any controversy. The consortium doesn't > care much for precendent, but come on, we have "play"and > "eject" symbols encoded! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Fri Apr 19 10:29:50 2024 From: pgcon6 at msn.com (Peter Constable) Date: Fri, 19 Apr 2024 15:29:50 +0000 Subject: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In-Reply-To: <90017c84-43c8-416f-a7fb-99e3463e4e63@ix.netcom.com> References: <1712852286169.3195214627.3679976055@gmail.com> <819bbaf0-5f13-4bc5-a2fc-4b370e5c1471@ix.netcom.com> <13e33901-f7e8-4519-ac69-1188f84a9d94@ix.netcom.com> <90017c84-43c8-416f-a7fb-99e3463e4e63@ix.netcom.com> Message-ID: Non-use of external files is a stated requirement for this context. But it does not state any requirement of a character code to display an icon for external links, and you can review the HTML version of RFC 7992 to see that it does not use any such icons. In conclusion, RFC 7992 does not provide any argument for UTC to encode a character for an external link icon. Peter From: Asmus Freytag Sent: Thursday, April 18, 2024 10:32 AM To: Peter Constable ; Marius Spix Cc: unicode at corp.unicode.org Subject: Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol In this context it is interesting that RFC7992 suggests use of a character code and, incidentally, makes the case that there are use-cases where reliance on external images is explicitly ruled out. This underscores the "chicken-and-egg" problem we so often encounter in character encoding. Unless a character is encoded, it can't be used in a text-only environment. In this particular case, my reading would be that there's nothing inherently favoring an image-only solution. It just happens that this was the only way to go in early implementations, but as RFC7992 reminds us, there are use cases where that is not an option. As a result, as I wrote in an earlier part of this thread, I can't get the warm and fuzzies about applying the "app iconography" principle here. Particularly not for something that always appears in connection with and part of a text block. There are many other examples that are much more removed from text and to which that principle is more properly applicable. A./ On 4/18/2024 10:13 AM, Peter Constable wrote: > If I were a submitter, I would treat something like the citation of RFC7992 then not as something that "settles" an encoding question, but one that calls for further research to see whether that particular convention is found in (enough) other places to help reach a decision. Completely agree with that. However, there is another factor in this that UTC will consider and _has_ considered: is the symbol used in public text data interchange. Wrt external link symbol, UTC has previously decided that this falls into the general class of symbols used in app iconography and that that evidence based on such usage is not sufficient for encoding as a textual character. Peter From: Asmus Freytag Sent: Wednesday, April 17, 2024 5:56 PM To: Peter Constable ; Marius Spix Cc: unicode at corp.unicode.org Subject: Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol This spells out what I tried to imply by warning against overinterpreting this example. However, whether something is a normative specification, and the limits of its normative scope is always only one aspect. Another aspect is whether it represents a record of somebody's explicit convention for what to do for a given feature, such as "anchor" links in the current example. Having a written convention that documents intent is preferable over relying on mere observation, for example, noticing that certain documents or certain platforms just happen to behave in a certain way. But, on it's own, just because something is written down is certainly not enough to suggest that the convention is common, let alone universal. From Unicode's perspective, a convention does not have to be normative for general purpose documents to be taken into account in making informed encoding decisions. The degree to which it is followed in practice (both in its original domain as well as in analogous cases) is usually more important. If I were a submitter, I would treat something like the citation of RFC7992 then not as something that "settles" an encoding question, but one that calls for further research to see whether that particular convention is found in (enough) other places to help reach a decision. Alternatively, it might serve as a data point for the conclusion that there's no single convention (with further research needed to find out whether this represents a case of a small number of alternate coexisting conventions, or the case of something where the real world hasn't settled on anything). A./ On 4/17/2024 5:32 PM, Peter Constable wrote: Let?s be clear: all that RFC 7992 is doing is documenting the conventions used in the non-canonical HTML versions of IETF RFCs. Unless in some other context there is a specification that normatively references RFC 7992, it has no real import beyond the HTML versions of IETF RFCs. Peter From: Unicode On Behalf Of Asmus Freytag via Unicode Sent: Monday, April 15, 2024 7:35 AM To: Marius Spix Cc: unicode at corp.unicode.org Subject: Re: Aw: Re: Re: [EXTERNAL] Re: External Link Symbol On 4/15/2024 6:55 AM, Marius Spix wrote: The pilcrow sign is offically mentioned in RFC 7992. See section 5.2. So I would consider it the conventional representation for anchor links. I would agree that it is "a convention" for representation of anchor links. It happens to work for English, as the pilcrow sign conventionally means "paragraph" and the intent in RFC7992 is to provide links to all paragraphs. However, the formatting of RFCs provided as HTML is a different beast from generic prescription for formatting all HTML documents. So this should not be over interpreted. A./ Gesendet: Freitag, 12. April 2024 um 18:46 Uhr Von: "Asmus Freytag via Unicode" An: unicode at corp.unicode.org Betreff: Re: Aw: Re: [EXTERNAL] Re: External Link Symbol The first and last choice are arguably not the most conventional representations for these. They are, at best, fallbacks. A./ On 4/12/2024 12:31 AM, Marius Spix via Unicode wrote: For all these types of links existing characters can be used: anchor links: U+00B6 ? PILCROW SIGN local links: U+1F517 ? LINK SYMBOL broken links (also known as red-links): U+26D3 U+200D U+1F4A5 CHAINS + ?ZERO WIDTH JOINER + COLLISION SYMBOL external links: U+2192 ? RIGHTWARDS ARROW Gesendet: Donnerstag, 11. April 2024 um 21:05 Uhr Von: "Asmus Freytag via Unicode" An: "Tom Moore" , "S?awomir Osipiuk" , "Asmus Freytag via Unicode" Betreff: Re: [EXTERNAL] Re: External Link Symbol On 4/11/2024 11:47 AM, Tom Moore wrote: > Then multiply that by 2, for links that navigate current tab vs. request to open a new tab. Is there a link to samples for all of these as used in practice, or is this just a theoretical distinction? A./ > > -----Original Message----- > From: Unicode On Behalf Of Slawomir Osipiuk via Unicode > Sent: Thursday, April 11, 2024 9:28 AM > To: asmusf ; Asmus Freytag via Unicode > Subject: [EXTERNAL] Re: External Link Symbol > > There are actually three kinds of links that are distinguishable from each > other: > > - A link to a different location in the current document (anchor link/jump > link) > - A link to a resource on the same network/domain as the current document (local link/relative link) > - A link to a resource on a different network (external link) > > All those can appear as symbols, used contrastively, within a run of text. > I'm very surprised these haven't already been encoded and that there is any controversy. The consortium doesn't care much for precendent, but come on, we have "play"and "eject" symbols encoded! -------------- next part -------------- An HTML attachment was scrubbed... URL: From smontagu at smontagu.org Sat Apr 20 13:18:26 2024 From: smontagu at smontagu.org (Simon Montagu) Date: Sat, 20 Apr 2024 21:18:26 +0300 Subject: HEBREW HE-WITH-ADNY-INSIDE In-Reply-To: <1b2771dd-39b7-4f59-a9f2-4a81bacec565@kli.org> References: <1b2771dd-39b7-4f59-a9f2-4a81bacec565@kli.org> Message-ID: <47445d1d-654f-49eb-b9e1-3171e4462140@smontagu.org> Is there any use case for this glyph except as the last letter of the Tetragrammaton? Does it make sense to encode it separately rather than the whole combination HEBREW TETRAGRAMMATON WITH ADNY INSIDE THE HE? On 18/04/2024 04:20, Mark E. Shoulson via Unicode wrote: > Wow, not a peep about this?? Surely a group this opinionated would have > something to say.? I guess I should propose this, since it's in use? > Probably would have a compatibility equivalence to just plain HEBREW > LETTER HE. > > ~mark > > On 4/1/24 17:39, Mark E. Shoulson via Unicode wrote: >> Looking waaaay back to my opus (with Michael Everson) of 1998, >> http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1740/n1740.htm, I call to >> attention one particular case mentioned there: the case where the >> second HEBREW LETTER HE of the Tetragrammaton is made very wide and >> another Holy Name (Adonay, ALEF-DALET-NUN-YOD) is printed in smaller >> letters inside it.? As mentioned last century, this is even now (well, >> then) commonly met with, especially in Sephardic prayer books. >> >> I mention it because I've found a bunch of professional Hebrew fonts >> which have a glyph for this special character.? Take a look at any one >> of many (but not all) of the offerings of the Samtype Foundry at >> https://www.myfonts.com/collections/samtype-foundry and you'll see >> what I mean.? Sometimes it's visible in the sample image, sometimes it >> isn't even though it's in the font.? They seem to be placing the glyph >> at codepoint U+FB50, which is ARABIC LETTER ALEF WASLA ISOLATED FORM, >> probably because it's the next character after the extended Hebrew >> code-block that ends at U+FB4F HEBREW LIGATURE ALEF LAMED and because, >> being in an Arabic codeblock, it has RTL directionality (while the PUA >> I think has LTR directionality, which is most inconvenient.) >> >> So it seems that this really is a thing being used by typefounders >> even now.? Probably should be encoded, yes?? My rationale from 1998 of >> encoding the Tetragrammaton as a glyph in itself was apparently not >> accepted, though after a later paper, >> https://unicode.org/L2/L2015/15092-hebew-nomina-sacra.pdf and some >> discussion, the YOD TRIANGLE U+05EF was encoded.? Perhaps this should >> be too?? I guess as a variant of HE perhaps?? (the name in the >> subject-header is not meant as a serious proposal for the glyph-name, >> though this letter is actually serious, despite the date.) >> >> ~mark > From mark at kli.org Sat Apr 20 20:22:33 2024 From: mark at kli.org (Mark E. Shoulson) Date: Sat, 20 Apr 2024 21:22:33 -0400 Subject: HEBREW HE-WITH-ADNY-INSIDE In-Reply-To: <47445d1d-654f-49eb-b9e1-3171e4462140@smontagu.org> References: <1b2771dd-39b7-4f59-a9f2-4a81bacec565@kli.org> <47445d1d-654f-49eb-b9e1-3171e4462140@smontagu.org> Message-ID: I don't think there's any use of it outside of being the last letter in the Tetragrammaton (except... not always the last letter... I was looking at a prayer-book this morning... see below.)? Encoding it as you suggest would be somewhat closer to my original proposal back in 1998, of HEBREW TETRAGRAMMATON. The thing is, it isn't QUITE always the last letter, kinda...? I think there's an example in one of the papers I linked, but Sephardic prayer-books (which is pretty much the only place I've seen this to begin with) sometimes use different vowel-points on the Tetragrammaton, presumably for Kabbalistic reasons.? So you'll often see it pointed as normal, ???????, SHEVA, HOLAM, QAMATS (some room to argue whether the HOLAM is on the HE or the VAV, but it doesn't really matter.), but then you'll see a paragraph where it's ???????? the first time and ???????? the second time and ???????? and so on... and it _definitely_ also appears as ????????????, apparently considering the shuruk, ?? to be a vowel and not a "letter" (I do not see ???????????? even though the "holam maleh" is also a vowel, I guess because holam maleh and holam haser are the same vowel, and ???????? (which does occur) suffices.? But ??? is not the same vowel as ?? (qubuts), at least not classically: one is considered long and one short.) If, as I was thinking in the original 1998 proposal, we consider all these to be glyphic variants of each other, then we could encode it as you suggest, and it would be essentially a special case (or set of special cases) of my original proposal, much like U+05EF YOD TRIANGLE ? became.? That might not be a bad idea; my proposal was seen as trying to take things too far, and maybe it was, and smaller solutions are warranted.? Maybe it should still have compatibility decomposition to YOD HE VAV HE for searching and equivalence purposes? Mmf, having trouble reaching the dkuug site; https://www.evertype.com/standards/tetra/tetra.html is a link to Michael Everson's site with the original proposal.? It shows a scanned example of different vowel-pointing, and mentions and shows a non-scanned example of the "eight-letter tetragrammaton", with ?? as the vowel, but it doesn't show an actual occurrence of that in scanned text.? I can easily supply that, though. (I have never seen this usage in an instance of the Tetragrammaton that is meant to be pronounced ELOHIM.? I don't know if it's done or how.) ~mark On 4/20/24 14:18, Simon Montagu via Unicode wrote: > Is there any use case for this glyph except as the last letter of the > Tetragrammaton? Does it make sense to encode it separately rather than > the whole combination HEBREW TETRAGRAMMATON WITH ADNY INSIDE THE HE? > > On 18/04/2024 04:20, Mark E. Shoulson via Unicode wrote: >> Wow, not a peep about this?? Surely a group this opinionated would >> have something to say.? I guess I should propose this, since it's in >> use? Probably would have a compatibility equivalence to just plain >> HEBREW LETTER HE. >> >> ~mark >> >> On 4/1/24 17:39, Mark E. Shoulson via Unicode wrote: >>> Looking waaaay back to my opus (with Michael Everson) of 1998, >>> http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1740/n1740.htm, I call to >>> attention one particular case mentioned there: the case where the >>> second HEBREW LETTER HE of the Tetragrammaton is made very wide and >>> another Holy Name (Adonay, ALEF-DALET-NUN-YOD) is printed in smaller >>> letters inside it. As mentioned last century, this is even now >>> (well, then) commonly met with, especially in Sephardic prayer books. >>> >>> I mention it because I've found a bunch of professional Hebrew fonts >>> which have a glyph for this special character.? Take a look at any >>> one of many (but not all) of the offerings of the Samtype Foundry at >>> https://www.myfonts.com/collections/samtype-foundry and you'll see >>> what I mean.? Sometimes it's visible in the sample image, sometimes >>> it isn't even though it's in the font.? They seem to be placing the >>> glyph at codepoint U+FB50, which is ARABIC LETTER ALEF WASLA >>> ISOLATED FORM, probably because it's the next character after the >>> extended Hebrew code-block that ends at U+FB4F HEBREW LIGATURE ALEF >>> LAMED and because, being in an Arabic codeblock, it has RTL >>> directionality (while the PUA I think has LTR directionality, which >>> is most inconvenient.) >>> >>> So it seems that this really is a thing being used by typefounders >>> even now.? Probably should be encoded, yes?? My rationale from 1998 >>> of encoding the Tetragrammaton as a glyph in itself was apparently >>> not accepted, though after a later paper, >>> https://unicode.org/L2/L2015/15092-hebew-nomina-sacra.pdf and some >>> discussion, the YOD TRIANGLE U+05EF was encoded.? Perhaps this >>> should be too?? I guess as a variant of HE perhaps?? (the name in >>> the subject-header is not meant as a serious proposal for the >>> glyph-name, though this letter is actually serious, despite the date.) >>> >>> ~mark >> From wjgo_10009 at btinternet.com Mon Apr 22 13:06:51 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 22 Apr 2024 19:06:51 +0100 (BST) Subject: A question about some 1960s single type border units and a question about whether Unicode should encode single type borders Message-ID: <363d9eca.3c0.18f06fd3f5e.Webtop.101@btinternet.com> I had in mind asking about something in this mailing list that might of itself be off-topic yet which may be of interest to some of the readers of this mailing list. Yet when working out what to write I found myself considering something which may well be possibly directly on topic, but I am not sure whether it is or not. ? The original topic is that I remember that in the mid 1960s I was given a few copies of then recent issues of the Monotype Newsletter when visiting the office of the Monotype Corporation at 43 Fetter Lane, London. In one of these, or maybe in a later issue that was sent to me, was an article about a collection of then newly released single type border units, possibly at one of 24 point, 30 point, or 36 point size. ? These were ten national emblem designs, five for constructing a straight border and five for corners. They could be used individually or mixed as desired. There was a rose, a thistle, a leek, a daffodil, a shamrock. Two of each, for a straight line and a corner. ? At that time the Monotype Corporation sold matrices for use in casting metal type. These matrices could be bought by businesses that used Monotype type casting machines. Some businesses cast type for one-off use in-house for printing, some businesses cast in a harder alloy and sold the type thus cast for repeated use in handset printing to people who used printing machines, whether by way of trade, or as hobbyist Private Press printers. My interest was in hobbyist Private Press. ? So whereas the Monotype Corporation offered for purchase a vast number of matrices, each business that bought them only bought a selection of them to suit their needs. So such things as this national emblems set need not necessarily become available to Private Presses that bought type from a typefounder. As far as I am aware, it was not. ? So I thought that I would ask in this mailing list as to whether those designs have been, or could be please, released in a digital form. Maybe these days in colour versions too. ? And then I thought, could they be encoded in regular Unicode? ? And I thought, well I cannot say that they would be used in a run of plain text. So maybe no. ? Yet the issue that I then wondered about is that files that are not plain text yet which contain Unicode characters are interchanged. ? So where does that fit in? ? Should Unicode encode characters that are single type borders that might well be used in a rich text document such as a poem surrounded by a border that is sent from one person to another? Or not. ? William Overington ? Monday 22 April 2024 ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Tue Apr 23 11:30:54 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 23 Apr 2024 09:30:54 -0700 Subject: A question about some 1960s single type border units and a question about whether Unicode should encode single type borders In-Reply-To: <363d9eca.3c0.18f06fd3f5e.Webtop.101@btinternet.com> References: <363d9eca.3c0.18f06fd3f5e.Webtop.101@btinternet.com> Message-ID: <03deace4-7d49-46e4-9020-f4bafa059f37@ix.netcom.com> If you want to propose symbols, you need to do the legwork and locate documents or books that they were used in. And if that positive evidence exists, the actual encoding decision would be based on the type of usage and the potential for these documents to be digitized for archival and other purposes. There are several places on the Unicode website where you can find instructions for submitting an encoding request and information and explanation for the types of documentation required -- by you, as the submitter. A./ On 4/22/2024 11:06 AM, William_J_G Overington via Unicode wrote: > > I had in mind asking about something in this mailing list that might > of itself be off-topic yet which may be of interest to some of the > readers of this mailing list. Yet when working out what to write I > found myself considering something which may well be possibly directly > on topic, but I am not sure whether it is or not. > > The original topic is that I remember that in the mid 1960s I was > given a few copies of then recent issues of the Monotype Newsletter > when visiting the office of the Monotype Corporation at 43 Fetter > Lane, London. In one of these, or maybe in a later issue that was sent > to me, was an article about a collection of then newly released single > type border units, possibly at one of 24 point, 30 point, or 36 point > size. > > These were ten national emblem designs, five for constructing a > straight border and five for corners. They could be used individually > or mixed as desired. There was a rose, a thistle, a leek, a daffodil, > a shamrock. Two of each, for a straight line and a corner. > > At that time the Monotype Corporation sold matrices for use in casting > metal type. These matrices could be bought by businesses that used > Monotype type casting machines. Some businesses cast type for one-off > use in-house for printing, some businesses cast in a harder alloy and > sold the type thus cast for repeated use in handset printing to people > who used printing machines, whether by way of trade, or as hobbyist > Private Press printers. My interest was in hobbyist Private Press. > > So whereas the Monotype Corporation offered for purchase a vast number > of matrices, each business that bought them only bought a selection of > them to suit their needs. So such things as this national emblems set > need not necessarily become available to Private Presses that bought > type from a typefounder. As far as I am aware, it was not. > > So I thought that I would ask in this mailing list as to whether those > designs have been, or could be please, released in a digital form. > Maybe these days in colour versions too. > > And then I thought, could they be encoded in regular Unicode? > > And I thought, well I cannot say that they would be used in a run of > plain text. So maybe no. > > Yet the issue that I then wondered about is that files that are not > plain text yet which contain Unicode characters are interchanged. > > So where does that fit in? > > Should Unicode encode characters that are single type borders that > might well be used in a rich text document such as a poem surrounded > by a border that is sent from one person to another? Or not. > > William Overington > > Monday 22 April 2024 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Apr 26 12:19:06 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 26 Apr 2024 18:19:06 +0100 (BST) Subject: Characters with both Han and Latin parts and remembering the New English Calligraphy of the artist Xu Bing Message-ID: I have been reading with interest the document. ? https://www.unicode.org/L2/L2024/24125-cjk-abbrev-block.pdf ? I know literally almost nothing about CJK characters, yet the glyphs with both Han and Latin parts reminded me of the New English Calligraphy of the artist Xu Bing. ? https://www.youtube.com/watch?v=t2GiHwCAz_4 ? https://www.youtube.com/watch?v=jzs0Z3YLU7I ? There is mention of New English Calligraphy is a post in the Unicode mailing list archive. ? https://unicode.org/mail-arch/unicode-ml/Archives-Old/UML025/1100.html ? An interesting thing is that the encoding problem for characters and glyphs produced using New English Calligraphy mentioned in that post can now be solved using the technique of a tag sequence for each of them in a one-to-one correspondence with the spelling of the English word. ? William Overington ? Friday 26 April 2024 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Apr 26 18:00:27 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 27 Apr 2024 00:00:27 +0100 (BST) Subject: Use of tag characters in a private encoding - is it valid please? Message-ID: <49c13a7.4aab.18f1ca37cd2.Webtop.101@btinternet.com> May I ask please, Is it valid to use a sequence of tag characters ending in a cancel tag in a private use encoding of characters if the base character of the sequence is a Private Use Area character? ? William Overington ? Friday 26 April 2024 ? ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecm.unicode at gmail.com Fri Apr 26 21:26:10 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Fri, 26 Apr 2024 22:26:10 -0400 Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <49c13a7.4aab.18f1ca37cd2.Webtop.101@btinternet.com> References: <49c13a7.4aab.18f1ca37cd2.Webtop.101@btinternet.com> Message-ID: UTS #51 (Unicode Emoji) tells us an emoji tag sequence begins with an emoji tag base, continues with one or more tag characters from U+E0020?U+E007E, and terminates with U+E007F (CANCEL TAG). The core specification?s chapter 23 (?Special Areas and Format Characters?), ?23.5 (?Private?Use Characters?) asserts that a private agreement can override any of the default properties of private?use characters except for those related to normalization. Ergo, it appears that a private agreement that endows a particular private?use character with emoji properties would permit that private?use character to serve as the base of an emoji tag sequence, within the scope of that same private agreement. On Fri, Apr 26, 2024 at 8:24?PM William_J_G Overington via Unicode < unicode at corp.unicode.org> wrote: > May I ask please, Is it valid to use a sequence of tag characters ending > in a cancel tag in a private use encoding of characters if the base > character of the sequence is a Private Use Area character? > > > > William Overington > > > > Friday 26 April 2024 > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenwhistler at sonic.net Fri Apr 26 21:39:44 2024 From: kenwhistler at sonic.net (Ken Whistler) Date: Fri, 26 Apr 2024 19:39:44 -0700 Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <49c13a7.4aab.18f1ca37cd2.Webtop.101@btinternet.com> References: <49c13a7.4aab.18f1ca37cd2.Webtop.101@btinternet.com> Message-ID: <662bc6ea-c379-419f-bb64-ce8b5ab2acb7@sonic.net> On 4/26/2024 4:00 PM, William_J_G Overington via Unicode wrote: > > May I ask please, Is it valid to use a sequence of tag characters > ending in a cancel tag in a private use encoding of characters if the > base character of the sequence is a Private Use Area character? > Of course. Just do not expect to be able to interchange your intent reliably, unless you get widespread public agreement about your use both of the PUA characters and your conventions for use of the tag characters. Otherwise, it is likely the person you communicate with will just see: *???????????????* As an example, I would be perfectly within my rights to try to convey today's date, April 26, 2024 with the following Unicode string: ????????? ????????? ????????? ????????? ????????? ????????? ????????? ????????? Perfectly conformant to the Unicode Standard. However, I expect that I would have difficulty convincing other people to use that convention to interchange dates, instead of the string "April 26, 2024" or "2024-04-26" or some similar, already established convention for the formatting of dates. On the bright side, at least they wouldn't be seeing a string of uninterpretable character boxes. ;-) --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecm.unicode at gmail.com Sat Apr 27 04:18:39 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Sat, 27 Apr 2024 05:18:39 -0400 Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <662bc6ea-c379-419f-bb64-ce8b5ab2acb7@sonic.net> References: <49c13a7.4aab.18f1ca37cd2.Webtop.101@btinternet.com> <662bc6ea-c379-419f-bb64-ce8b5ab2acb7@sonic.net> Message-ID: The tag characters being default ignorable, I would expect a single missing?glyph glyph (representing the private?use code point) to be more likely, though your mileage may vary. Here, for example, is a sequence containing 15 tag characters between a private?use base and the CANCEL TAG: ??????????????????? On Friday, April 26, 2024, Ken Whistler via Unicode < unicode at corp.unicode.org> wrote: > Just do not expect to be able to interchange your intent reliably, unless > you get widespread public agreement about your use both of the PUA > characters and your conventions for use of the tag characters. Otherwise, > it is likely the person you communicate with will just see: > *???????????????* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Sat Apr 27 10:59:00 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 27 Apr 2024 16:59:00 +0100 (BST) Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <662bc6ea-c379-419f-bb64-ce8b5ab2acb7@sonic.net> References: <662bc6ea-c379-419f-bb64-ce8b5ab2acb7@sonic.net> Message-ID: <4087e65c.51b1.18f20480164.Webtop.101@btinternet.com> Erik Carvalhal Miller wrote as follows. ? > The tag characters being default ignorable, I would expect a single missing?glyph glyph (representing the private?use code point) to be more likely, though your mileage may vary. ? Here, for example, is a sequence containing 15 tag characters between a private?use base and the CANCEL TAG: ??????????????????? ? Thank you for posting the example. ? In the Unicode mailing list archive and in the webmail that I use the display was of the mathematical brackets with one glyph that indicates a missing glyph between them. When I copied the example onto the clipboard and then pasted into WordPad, the display was of the mathematical brackets with seventeen of the glyphs that each indicate a missing glyph between the mathematical brackets. ? I saved from WordPad using the Save as feature to save in the file format that WordPad names as a Unicode Text Document, which is a UTF-16 format file with a BYTE ORDER MARK that indicates, in this particular file, that the low byte is stored before the high byte. I used tags.txt as the file name. ? I then opened the tags.txt file in the ViewHex.exe program that Erwin Denissen had kindly posted in 2009. ? https://forum.high-logic.com/viewtopic.php?p=10579#p10579 ? From there I can note that the fifteen tag character message is as follows. ? This is a test. ? I used the Edit Search... facility of the FontCreator program to find that the base character used is as follows. ? U+10FFFD ? A good choice, at the top of the map, so maybe it can be thought of as NORTH STAR ? There was a famous early steam locomotive named NORTH STAR so one may, if one so chooses, think of the NORTH STAR locomotive hauling a train of tag characters with the CANCEL TAG as a brake van at the end of the train. ? https://en.wikipedia.org/wiki/GWR_Star_Class ? An OpenType font used in an OpenType-aware application program can be used to decode the base character and the sequence of tag characters. ? I tried the technique when the QID emoji proposal was being considered and the technique worked well. ? https://forum.high-logic.com/viewtopic.php?p=39337 ? The technique can be compared and contrasted with the use of a direct Private Use Area encoding for a character that one has designed. ? A direct Private Use Area encoding is easier to use, a tag sequence encoding provides scope for a greater chance of a unique encoding assisting unambiguous interoperability and archiving. ? ?? William Overington ? Saturday 27 April 2024 ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecm.unicode at gmail.com Sun Apr 28 12:44:37 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Sun, 28 Apr 2024 13:44:37 -0400 Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <4087e65c.51b1.18f20480164.Webtop.101@btinternet.com> References: <662bc6ea-c379-419f-bb64-ce8b5ab2acb7@sonic.net> <4087e65c.51b1.18f20480164.Webtop.101@btinternet.com> Message-ID: Although the angle brackets have Unicode names containing the word ?mathematical? and reside in the Miscellaneous Mathematical Symbols-A block, I was thinking of their linguistic use for denoting characters qua characters. The single missing?glyph glyph you originally saw between them was the fallback display I expected in accordance with the Standard. Note that UTS #51 encourages any implementation that supports emoji tag sequences but has difficulty with a particular sequence to fall back by displaying the base emoji either followed by or overlaid by a ?missing?emoji glyph?; since in this case it?s not likely that the PUA character would even be recognized as an emoji, the fallback you saw is the best?case scenario one can expect in the absence of a private?use agreement. On Saturday, April 27, 2024, William_J_G Overington via Unicode < unicode at corp.unicode.org> wrote: > In the Unicode mailing list archive and in the webmail that I use the > display was of the mathematical brackets with one glyph that indicates a > missing glyph between them. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Mon Apr 29 13:06:44 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 29 Apr 2024 19:06:44 +0100 (BST) Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <4087e65c.51b1.18f20480164.Webtop.101@btinternet.com> References: <4087e65c.51b1.18f20480164.Webtop.101@btinternet.com> Message-ID: <75c724d3.70f1.18f2b09a95f.Webtop.101@btinternet.com> Erik Carvalhal Miller wrote as follows. ? > Although the angle brackets have Unicode names containing the word > ?mathematical? and reside in the Miscellaneous Mathematical Symbols-A > block, I was thinking of their linguistic use for denoting characters > qua characters. ? I was unaware of that usage. Thank you for explaining. ? > The single missing?glyph glyph you originally saw between them was the > fallback display I expected in accordance with the Standard. ? > Note that UTS #51 encourages any implementation that supports emoji > tag sequences but has difficulty with a particular sequence to fall > back by displaying the base emoji either followed by or overlaid by a > ?missing?emoji glyph?; ? That situation is because the character that is used for the base character of the tag sequence can also be used on its own for its original meaning. I am not suggesting, (within the limits of the usage being discussed here as anyone may use a Private Use character for their own purpose) using U+10FFFD other than as the base character for a tag sequence. If the OpenType font recognizes a particular sequence of the base character and some tag characters as if a ligature and displays a substituted glyph accordingly, then no glyph for U+10FFFD will be displayed. So a display of a glyph for U+10FFFD will only be displayed if the font in use does not recognize a particular sequence of the base character and some tag characters. So, for example, if a font with the suggested glyph for U+10FFFD and recognizing, say, twenty sequences of the base character and some tag characters, is used to display some text, then the font could respond according to whatever sequences are in the text that is displayed, substituting a glyph or displaying U+10FFFD as appropriate for each sequence encountered. ? A font with visible glyphs for tag characters will be helpful for composing sequences and could also be useful for finding the meaning of sequences that are not supported by any font available to the particular end user. ? > since in this case it?s not likely that the PUA character would even > be recognized as an emoji, the fallback you saw is the best?case > scenario one can expect in the absence of a private?use agreement. ? Well, I was not restricting myself to emoji in applying the technique of using U+10FFFD followed by a sequence of tag characters of which the final one is a CANCEL TAG. Emoji sometimes, yet other things too. ? I had in mind a font where the glyph for U+10FFFD would be a rectangle with within the rectangle the top half of a question mark and instead of a dot a horizontal arrow pointing to the right as viewed by the viewer. ? I consider that the phrase "private agreement" in The Unicode Standard is, well, not. the whole situation, as it is perfectly possible for on person to produce and publish a document declaring some meanings and/or glyphs. So while for anyone else to apply those meanings and/or glyphs does imply at least a tacit, temporary, like watching a science fiction movie suspension of disbelief, sort of agreement, it is not the almost formal contractual situation that The Unicode Standard could be reasonably thought to be writing about. ? https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf page 23 of the PDF document ? William Overington ? Monday 29 April 2024 ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecm.unicode at gmail.com Tue Apr 30 23:19:42 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Wed, 1 May 2024 00:19:42 -0400 Subject: Use of tag characters in a private encoding - is it valid please? In-Reply-To: <75c724d3.70f1.18f2b09a95f.Webtop.101@btinternet.com> References: <4087e65c.51b1.18f20480164.Webtop.101@btinternet.com> <75c724d3.70f1.18f2b09a95f.Webtop.101@btinternet.com> Message-ID: On Mon, Apr 29, 2024 at 2:13?PM William_J_G Overington via Unicode < unicode at corp.unicode.org> wrote: > I consider that the phrase "private agreement" in The Unicode Standard is, well, not. the whole situation, as it is perfectly possible for on person to produce and publish a document declaring some meanings and/or glyphs. So while for anyone else to apply those meanings and/or glyphs does imply at least a tacit, temporary, like watching a science fiction movie suspension of disbelief, sort of agreement, it is not the almost formal contractual situation that The Unicode Standard could be reasonably thought to be writing about. > > https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf page 23 of the PDF document The section you cite does not support the obligation of an ?almost formal contractual situation?. One of Unicode?s online FAQ pages ( https://www.unicode.org/faq/private_use.html) has this to say: >> Q: What does "private agreement among cooperating parties" mean? >> >> A "private agreement" simply refers to the fact that agreement about the interpretation of some set of private-use characters is done privately, outside the context of the standard. The Unicode Standard does not specify any particular interpretation for any private-use character. There is no implication that a private agreement necessarily has any contractual or other legal status?it is simply an agreement between two or more parties about how a particular set of private-use characters should be interpreted. >> >> Q: How would I define a private agreement? >> >> One can share, or even publish, documentation containing particular assignments for private-use characters, their glyphs, and other relevant information about their interpretation. One can then ask others to use those private-use characters as documented. One can create appropriate fonts and IMEs, or request that others do so. On Mon, Apr 29, 2024 at 2:13?PM William_J_G Overington via Unicode < unicode at corp.unicode.org> wrote this too: > A font with visible glyphs for tag characters will be helpful for composing sequences and could also be useful for finding the meaning of sequences that are not supported by any font available to the particular end user. > > > since in this case it?s not likely that the PUA character would even be recognized as an emoji, the fallback you saw is the best?case scenario one can expect in the absence of a private?use agreement. > > Well, I was not restricting myself to emoji in applying the technique of using U+10FFFD followed by a sequence of tag characters of which the final one is a CANCEL TAG. Emoji sometimes, yet other things too. That same chapter you linked to, in ?23.9 (?Tag Characters?), specifies two usages for tag characters: (1) the now?deprecated language tagging that was their original purpose and (2) emoji tag sequences, as further specified in UTS #51 (as I brought up earlier). You began this thread by asking about validity; my reading is no, a non?emoji private?use tag sequence is not valid according to the Standard. (Nevertheless, you might get it to function anyway.) It?s not clear why you would want to use tag sequences (emoji or otherwise). The 137,468 private?use code points available are well suited for specialty characters. The fallback of having your specialty font(s) visibly display the tag characters of a (private?use) well?formed but unrecognized tag sequence, though possibly useful, not only perverts the notion that tag characters are supposed to be invisible in normal rendering but also sets up a needlessly inconsistent system. If it?s important and appropriate for end users to see a fallback display resembling the Basic Latin repertoire, then why not use the Basic Latin characters, so that end users without the benefit of a special font can see them? If it?s not appropriate or important, then why make the sequence characters visible in fallback at all (outside special modes such as composition or ?show hidden?)? And if the sequence pieces aren?t to be seen, why use a sequence at all (especially an invalid one), instead of individual private?use code points? The tag characters seem like a needless complication. -------------- next part -------------- An HTML attachment was scrubbed... URL: