From hsivonen at mozilla.com Fri Feb 2 07:41:10 2024 From: hsivonen at mozilla.com (Henri Sivonen) Date: Fri, 2 Feb 2024 15:41:10 +0200 Subject: Use case documentation for UTS 46 parameters Message-ID: Hi, The Processing steps in UTS 46 take various boolean flags. Are the use cases for each one documented somewhere? That is, when and why would one want to set each flag to true or false? -- Henri Sivonen hsivonen at mozilla.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Feb 2 16:33:51 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 2 Feb 2024 22:33:51 +0000 (GMT) Subject: Private Use Area characters and the eudcedit program Message-ID: <78ea0e0b.225a.18d6bf4f134.Webtop.92@btinternet.com> Regarding the issue raised in the thread https://forum.affinity.serif.com/index.php?/topic/197938-private-characters-created-with-microsoft-eudceditexe can anyone explain what is happening please? William Overington Friday 2 February 2024 From sosipiuk at gmail.com Fri Feb 2 16:56:24 2024 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Fri, 02 Feb 2024 22:56:24 +0000 Subject: Private Use Area characters and the eudcedit program In-Reply-To: <78ea0e0b.225a.18d6bf4f134.Webtop.92@btinternet.com> References: <78ea0e0b.225a.18d6bf4f134.Webtop.92@btinternet.com> Message-ID: <1706914274948.1201958261.3436189612@gmail.com> Taking an educated guess: If the software in question uses Windows' font rendering API, it will render the private characters as intended. If the software uses its own rendering functions or libraries, it will not render the custom private characters because it has no awareness of them. On Friday, 02 February 2024, 17:33:51 (-05:00), William_J_G Overington via Unicode wrote: > Regarding the issue raised in the thread > > https://forum.affinity.serif.com/index.php?/topic/197938-private-characters-created-with-microsoft-eudceditexe > > can anyone explain what is happening please? > > William Overington > > Friday 2 February 2024 From list+unicode at jdlh.com Fri Feb 2 18:04:40 2024 From: list+unicode at jdlh.com (Jim DeLaHunt) Date: Fri, 2 Feb 2024 16:04:40 -0800 Subject: Private Use Area characters and the eudcedit program In-Reply-To: <78ea0e0b.225a.18d6bf4f134.Webtop.92@btinternet.com> References: <78ea0e0b.225a.18d6bf4f134.Webtop.92@btinternet.com> Message-ID: On 2024-02-02 14:33, William_J_G Overington via Unicode wrote: > Regarding the issue raised in the thread > > https://forum.affinity.serif.com/index.php?/topic/197938-private-characters-created-with-microsoft-eudceditexe > The issue appears to be (copying text from that thread to this): > I have created two "private" characters using the Windows built-in > eudcedit utility. The first one I have saved to a specific font and > the second one I have saved to all fonts. > > I can locate and copy both characters in Character Map and paste them > successfully into Notepad and into my CAD programs, but not Affinity > Publisher (v.1)? Is there a special procedure in Publisher that will > overcome this, or is the programe not yet equipped to deal with > private characters? > On 2024-02-02 14:33, William_J_G Overington via Unicode wrote: > can anyone explain what is happening please? I can perhaps shed some light, if not explain definitively. Anyone using EUDCedit would be well advised to learn what Windows has to say about what EUDC is and how it works in Windows. A web search finds: *End-User-Defined and Private Use Area Characters* (2021) > End-user-defined characters (EUDC) in double-byte character sets > > (DBCSs) and private use area (PUA) characters in Unicode > are > custom characters. They can be defined and implemented either by an > end user or by another party?. Their use enables users to form names > and other words using characters that are not available in standard > screen and printer fonts. > > The EUDC and PUA characters can be assigned differently, or not > assigned at all, on different computers. Some code pages have > extensions that reuse the EUDC range, ? a manufacturer might provide a > custom set of characters in one of these ranges, ? user groups can > attempt to provide additional characters in the PUA. Different > combinations of these cases can cause conflict. When creating > applications that rely on EUDC or PUA characters, you should keep in > mind the conflicting interpretations of an individual code point.? > *Character Sets and Fonts* (2021) > To create an EUDC or PUA character, the user chooses a character value > that is within the specified range and adds the glyph > > to the font in the entry that corresponds to that character value. The > user creates the glyph using an EUDC editor or using a font package > purchased from a font vendor. Any DBCS font can contain EUDCs, and any > Unicode font can contain PUA characters. The font is called a > "separate" EUDC/PUA font if it contains only EUDCs. The font is an > "integrated" EUDC/PUA font if it contains standard characters as well > as EUDCs.? > TrueType fonts can be installed either as .ttf files or as .tte files. > Since the operating system hides .tte files, applications cannot > enumerate or otherwise examine the installed fonts using GDI API > functions. On many operating systems, the system default EUDC/PUA font > and separate EUDC/PUA fonts are installed as .tte files. Applications > such as EUDC editors and the Control Panel must use registry entries > to add, modify, and delete such fonts.? The backstory is that end-user defined character handling is a text requirement originating from ideographic scripts, especially Japan, and an era when the glyph complement of Japanese fonts was small (c 5,000 glyphs) compared to the range of ideographic characters listed in dictionaries and fair game to use in text (c 70,000 characters). Authors wanting to use such "outside characters" (known as "gaiji" in Japanese) in their publications had to resort to special measures like EUDCs. OS and application vendors who wanted to sell to serious publishers in the Japanese market need to provide EUDC tools. The need for special measures like EUDC has receded greatly with the arrival of ideographic script fonts with very large glyph repertoires. Use of EUDC tools in ideographic script documents is likely now a niche. Use of EUDC outside of ideographic script context is even more of a niche. The original question was, "I can [not] locate and copy [my EUDC] characters in? Affinity Publisher (v.1)? Is ? the programe not yet equipped to deal with private characters?" It seems pretty likely to me that the program is not equipped to deal with EUDC characters. The feature list for Affinity Publisher does not mention Japanese or Chinese typography support. If they do not have ideographic typography as a major feature, they are even more unlikely to have Windows-specific EUDC support. -- . --Jim DeLaHunt,jdlh at jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, B.C., Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Sat Feb 3 12:13:42 2024 From: pgcon6 at msn.com (Peter Constable) Date: Sat, 3 Feb 2024 18:13:42 +0000 Subject: Private Use Area characters and the eudcedit program In-Reply-To: References: <78ea0e0b.225a.18d6bf4f134.Webtop.92@btinternet.com> Message-ID: Adding to Jim's excellent answer: the EUDC mechanism that has been in Windows since the 90s is integrated into platform font fallback mechanisms (more specifically, font linking). A privately-defined character is "linked" (associated with) particular fonts all fonts installed in the system, and will display anywhere that makes use of Win32 (GDI, User...), GDI+, Uniscribe or DWrite _unless_ lower-level APIs for drawing text are used that bypass the platform font fallback mechanisms. Affinity products use DWrite, but probably use only those lower-level APIs. Peter From: Unicode On Behalf Of Jim DeLaHunt via Unicode Sent: Friday, February 2, 2024 5:05 PM To: unicode at corp.unicode.org Subject: Re: Private Use Area characters and the eudcedit program On 2024-02-02 14:33, William_J_G Overington via Unicode wrote: Regarding the issue raised in the thread https://forum.affinity.serif.com/index.php?/topic/197938-private-characters-created-with-microsoft-eudceditexe The issue appears to be (copying text from that thread to this): I have created two "private" characters using the Windows built-in eudcedit utility. The first one I have saved to a specific font and the second one I have saved to all fonts. I can locate and copy both characters in Character Map and paste them successfully into Notepad and into my CAD programs, but not Affinity Publisher (v.1) Is there a special procedure in Publisher that will overcome this, or is the programe not yet equipped to deal with private characters? On 2024-02-02 14:33, William_J_G Overington via Unicode wrote: can anyone explain what is happening please? I can perhaps shed some light, if not explain definitively. Anyone using EUDCedit would be well advised to learn what Windows has to say about what EUDC is and how it works in Windows. A web search finds: End-User-Defined and Private Use Area Characters (2021) End-user-defined characters (EUDC) in double-byte character sets (DBCSs) and private use area (PUA) characters in Unicode are custom characters. They can be defined and implemented either by an end user or by another party.... Their use enables users to form names and other words using characters that are not available in standard screen and printer fonts. The EUDC and PUA characters can be assigned differently, or not assigned at all, on different computers. Some code pages have extensions that reuse the EUDC range, ... a manufacturer might provide a custom set of characters in one of these ranges, ... user groups can attempt to provide additional characters in the PUA. Different combinations of these cases can cause conflict. When creating applications that rely on EUDC or PUA characters, you should keep in mind the conflicting interpretations of an individual code point.... Character Sets and Fonts (2021) To create an EUDC or PUA character, the user chooses a character value that is within the specified range and adds the glyph to the font in the entry that corresponds to that character value. The user creates the glyph using an EUDC editor or using a font package purchased from a font vendor. Any DBCS font can contain EUDCs, and any Unicode font can contain PUA characters. The font is called a "separate" EUDC/PUA font if it contains only EUDCs. The font is an "integrated" EUDC/PUA font if it contains standard characters as well as EUDCs.... TrueType fonts can be installed either as .ttf files or as .tte files. Since the operating system hides .tte files, applications cannot enumerate or otherwise examine the installed fonts using GDI API functions. On many operating systems, the system default EUDC/PUA font and separate EUDC/PUA fonts are installed as .tte files. Applications such as EUDC editors and the Control Panel must use registry entries to add, modify, and delete such fonts.... The backstory is that end-user defined character handling is a text requirement originating from ideographic scripts, especially Japan, and an era when the glyph complement of Japanese fonts was small (c 5,000 glyphs) compared to the range of ideographic characters listed in dictionaries and fair game to use in text (c 70,000 characters). Authors wanting to use such "outside characters" (known as "gaiji" in Japanese) in their publications had to resort to special measures like EUDCs. OS and application vendors who wanted to sell to serious publishers in the Japanese market need to provide EUDC tools. The need for special measures like EUDC has receded greatly with the arrival of ideographic script fonts with very large glyph repertoires. Use of EUDC tools in ideographic script documents is likely now a niche. Use of EUDC outside of ideographic script context is even more of a niche. The original question was, "I can [not] locate and copy [my EUDC] characters in... Affinity Publisher (v.1) Is ... the programe not yet equipped to deal with private characters?" It seems pretty likely to me that the program is not equipped to deal with EUDC characters. The feature list for Affinity Publisher does not mention Japanese or Chinese typography support. If they do not have ideographic typography as a major feature, they are even more unlikely to have Windows-specific EUDC support. -- . --Jim DeLaHunt, jdlh at jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, B.C., Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: From freek at macfreek.nl Fri Feb 16 05:27:20 2024 From: freek at macfreek.nl (Freek Dijkstra) Date: Fri, 16 Feb 2024 12:27:20 +0100 Subject: What's the process for proposing a symbol in the Unicode table? Message-ID: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> Hi, I've long been annoyed that there is no Unicode symbol for the flourish of approval ("krul" or "krulletje"), which is a common symbol used in the Netherlands, mostly in elemetary schools, but rarely outside the Netherlands. 1. What is the process for submitting assigning a codepoint to a symbol currently missing from the Unicode tables? 2. Has this character (see references below) been proposed before? References (found by a simple web search): * https://en.wikipedia.org/wiki/Flourish_of_approval * https://graphicdesign.stackexchange.com/questions/58320/what-is-the-name-or-unicode-for-this-symbol-similar-to-?-dutch-called-krul * https://tex.stackexchange.com/questions/313281/how-to-make-a-krul-unofficial-dutch-symbol-for-ok Following these links, it is easy to see there is widespread adoption (with links to NRC, one of the national Dutch newspapers, or a video made by NTR, a publicly funded television station). Note: I'm not a linguist, but IT specialist, and had was highly surprised it's not in Unicode when needed some years ago. The wikipedia and other articles expressed the same surprise. I came across this issue again, so I joined the Unicode as member so I can ask this question. Regards, Freek Dijkstra -------------- next part -------------- An HTML attachment was scrubbed... URL: From bortzmeyer at nic.fr Fri Feb 16 09:50:45 2024 From: bortzmeyer at nic.fr (Stephane Bortzmeyer) Date: Fri, 16 Feb 2024 16:50:45 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> Message-ID: On Fri, Feb 16, 2024 at 12:27:20PM +0100, Freek Dijkstra via Unicode wrote a message of 188 lines which said: > 1. What is the process for submitting assigning a codepoint to a symbol > currently missing from the Unicode tables? http://unicode.org/emoji/proposals.html From jameskass at code2001.com Fri Feb 16 10:11:13 2024 From: jameskass at code2001.com (James Kass) Date: Fri, 16 Feb 2024 16:11:13 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> Message-ID: <8fbef790-93aa-4abd-bd27-5351177f9532@code2001.com> On 2024-02-16 3:50 PM, Stephane Bortzmeyer via Unicode wrote: > On Fri, Feb 16, 2024 at 12:27:20PM +0100, > Freek Dijkstra via Unicode wrote > a message of 188 lines which said: > >> 1. What is the process for submitting assigning a codepoint to a symbol >> currently missing from the Unicode tables? > http://unicode.org/emoji/proposals.html > If the symbol is not an emoji: https://www.unicode.org/pending/symbol-guidelines.html Submitting character proposals: http://www.unicode.org/pending/proposals.html From asmusf at ix.netcom.com Fri Feb 16 10:41:43 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 16 Feb 2024 08:41:43 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> Message-ID: <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> On 2/16/2024 7:50 AM, Stephane Bortzmeyer via Unicode wrote: > On Fri, Feb 16, 2024 at 12:27:20PM +0100, > Freek Dijkstra via Unicode wrote > a message of 188 lines which said: > >> 1. What is the process for submitting assigning a codepoint to a symbol >> currently missing from the Unicode tables? > http://unicode.org/emoji/proposals.html > This assumes that the "symbol" is an emoji. Which the "Flourish of approval" would not necessarily be, unless the idea was to create an emoji for it, like the check mark. The Unicode FAQ has pointers to both emoji and other character proposals, A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Feb 16 11:38:58 2024 From: doug at ewellic.org (Doug Ewell) Date: Fri, 16 Feb 2024 17:38:58 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> Message-ID: Asmus Freytag wrote: >>> 1. What is the process for submitting assigning a codepoint to a >>> symbol currently missing from the Unicode tables? >> >> http://unicode.org/emoji/proposals.html > > This assumes that the "symbol" is an emoji. Which the "Flourish of > approval" would not necessarily be, unless the idea was to create an > emoji for it, like the check mark. The OP?s post and references seem rather clear that it is intended as a normal character, for use with normal text, often handwritten, and used in plain-text environments (e.g. ?mostly in elementary schools? and ?for grading schoolwork?). I would think the process for proposing normal characters would need to be followed, and this should not be proposed as an emoji for the purpose of getting it encoded via the easier emoji process. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From asmusf at ix.netcom.com Fri Feb 16 12:34:11 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 16 Feb 2024 10:34:11 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> Message-ID: <329f4c0f-c6bc-4d52-8b89-2b5cb6cc9204@ix.netcom.com> On 2/16/2024 9:38 AM, Doug Ewell via Unicode wrote: > Asmus Freytag wrote: > >>>> 1. What is the process for submitting assigning a codepoint to a >>>> symbol currently missing from the Unicode tables? >>> http://unicode.org/emoji/proposals.html >> This assumes that the "symbol" is an emoji. Which the "Flourish of >> approval" would not necessarily be, unless the idea was to create an >> emoji for it, like the check mark. > The OP?s post and references seem rather clear that it is intended as a normal character, for use with normal text, often handwritten, and used in plain-text environments (e.g. ?mostly in elementary schools? and ?for grading schoolwork?). > > I would think the process for proposing normal characters would need to be followed, and this should not be proposed as an emoji for the purpose of getting it encoded via the easier emoji process. > Well, the similarity to a check mark is there. We usually don't encode characters intended for use in handwriting, except if they are needed to digitally archive manuscripts. Not sure grade school papers pass that bar. However, I could be wrong and the details depend on how the case for encoding is argued. In contrast, there are signs that are normally written by hand that also qualify as standing for an idea, that would be natural to incorporate in informal writing, which is the case for the check mark. If placing the mark in a text environment where emoji would normally be used, would it be seen and understood as "approved" in Dutch culture? Would anyone use it that way? Would a Netherlands-based Consortium have long added it to their collection? I don't have any of the answers. It's up to the submitters. A./ From freek at macfreek.nl Fri Feb 16 12:40:03 2024 From: freek at macfreek.nl (Freek Dijkstra) Date: Fri, 16 Feb 2024 19:40:03 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> Message-ID: <8a48abf7-e1b7-4694-a27f-2b614f042d02@macfreek.nl> All, thank you for the responses! Indeed, this is a not an emoji, but a symbol very akin to a checkmark, which is found in the Dingbats table (https://www.unicode.org/charts/PDF/U2700.pdf). According to a newspaper article on its history, it originates somewhere in the 19th century. So I'll follow the normal character proposal process. In the mean time, I not only found the forms to fill in at https://www.unicode.org/L2/summary.html, I even find someone who was -just like me- "genuinely mildly irritated" with the fact that there was no codepoint in Unicode, and even created a website to fix this: https://unicode-krul.nl/en I'll first try to contact them. I suspect that their genuine irritation was mild enough that is was eventually abandoned after seeing the effort it seemingly takes to get this done. :) Let's hope we're more successful this time. With kind regards, Freek Dijkstra On 16-02-2024 18:38, Doug Ewell via Unicode wrote: > Asmus Freytag wrote: > >>>> 1. What is the process for submitting assigning a codepoint to a >>>> symbol currently missing from the Unicode tables? >>> http://unicode.org/emoji/proposals.html >> This assumes that the "symbol" is an emoji. Which the "Flourish of >> approval" would not necessarily be, unless the idea was to create an >> emoji for it, like the check mark. > The OP?s post and references seem rather clear that it is intended as a normal character, for use with normal text, often handwritten, and used in plain-text environments (e.g. ?mostly in elementary schools? and ?for grading schoolwork?). > > I would think the process for proposing normal characters would need to be followed, and this should not be proposed as an emoji for the purpose of getting it encoded via the easier emoji process. > > -- > Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org > > From doug at ewellic.org Sat Feb 17 13:18:48 2024 From: doug at ewellic.org (Doug Ewell) Date: Sat, 17 Feb 2024 19:18:48 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <8a48abf7-e1b7-4694-a27f-2b614f042d02@macfreek.nl> References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> <8a48abf7-e1b7-4694-a27f-2b614f042d02@macfreek.nl> Message-ID: Freek Dijkstra wrote: > In the mean time, I not only found the forms to fill in at > https://www.unicode.org/L2/summary.html, I even find someone who was > -just like me- "genuinely mildly irritated" with the fact that there > was no codepoint in Unicode, and even created a website to fix this: > https://unicode-krul.nl/en As you?ve probably guessed, writing an actual proposal and being available to discuss it with the committees (Script Ad Hoc and Unicode Technical Committee) is much more effective than being irritated that the symbol is not already there. Websites about the symbol and about the irritation, or other lobbying efforts, may feel good but are also not the road to encoding. In the early days of Unicode and ISO 10646, say 25 or 30 years ago, a missing character might be discovered in an existing, commonly used 8-bit character set, and that was often enough to get it added to Unicode. For quite some time now, just about all of the ?obvious? characters have been encoded, and it does take more effort to encode new ones, especially those that have never been represented in digital plain text before. You will want to show in your proposal that there is demand for representing this symbol, which seems to be a handwritten convention, in computerized text. The Dingbats block is not a good analogy ? those characters came from laser printers and symbol fonts, and thus by definition were used extensively on computers. > I'll first try to contact them. I suspect that their genuine > irritation was mild enough that is was eventually abandoned after > seeing the effort it seemingly takes to get this done. :) Let's hope > we're more successful this time. As above. The amount of effort required for a symbol like this is reasonable and justified. Not everything that has ever been written is a candidate for encoding as a character. Good justification and evidence for this one will be needed. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From jukkakk at gmail.com Sat Feb 17 13:57:47 2024 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Sat, 17 Feb 2024 21:57:47 +0200 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> <8a48abf7-e1b7-4694-a27f-2b614f042d02@macfreek.nl> Message-ID: Doug Ewell via Unicode (unicode at corp.unicode.org) wrote:: > You will want to show in your proposal that there is demand for representing this symbol, > which seems to be a handwritten convention, in computerized text. I?d like to add to this good advice the point that the symbol should have demonstrable use in text, as a character, as opposite to a hand-drawn symbol in the margin. I think it is less relevant that the symbol has been used in computerized text, i.e. in text in digital format. Obviously, since the symbol does not exist in Unicode, any digital format has had to use an image, or perhaps a Private Use character. But if you can demonstrate use of the symbol as an image in commonly used digital formats and the need for encoding it as a character for use in plain text formats, I think you would have a case. Yucca, https://jkorpela.fi From jameskass at code2001.com Sat Feb 17 13:59:50 2024 From: jameskass at code2001.com (James Kass) Date: Sat, 17 Feb 2024 19:59:50 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <101fd78e-673f-43a9-837e-98b018c3c040@ix.netcom.com> <8a48abf7-e1b7-4694-a27f-2b614f042d02@macfreek.nl> Message-ID: On 2024-02-17 7:18 PM, Doug Ewell via Unicode wrote: > As above. The amount of effort required for a symbol like this is > reasonable and justified. Not everything that has ever been written is a > candidate for encoding as a character. Good justification and evidence > for this one will be needed. The Wikipedia page linked earlier, https://en.wikipedia.org/wiki/Flourish_of_approval ... suggests using the German pfennig symbol (?) as a substitute for the krul.? Evidence that U+20B0 GERMAN PENNY SIGN is being used as a krul in real world computer data interchange could be helpful to a proposal. From christoph.paeper at crissov.de Sat Feb 17 14:02:05 2024 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Sat, 17 Feb 2024 21:02:05 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <329f4c0f-c6bc-4d52-8b89-2b5cb6cc9204@ix.netcom.com> References: <329f4c0f-c6bc-4d52-8b89-2b5cb6cc9204@ix.netcom.com> Message-ID: Asmus Freytag via Unicode : > > We usually don't encode characters intended for use in handwriting, except if they are needed to digitally archive manuscripts. Not sure grade school papers pass that bar. Every piece of writing might be digitally archived nowadays, even more so in the future. Therefore, every _established_ literal atomic sign should be encodable, so it can be unambiguously read by machines. I strongly believe this includes paralinguistic signs, whereas nonlinguistic signs (e.g. much of ISO 7000) would require an extension of the scope of Unicode (although several graphic symbols from that and other standards already have a codepoint assigned to them). This one is clearly well established, i.e. has at least one canonical form and meaning, even if its use is geographically limited. It cannot be represented by a combination of other, already encoded characters. From christoph.paeper at crissov.de Sat Feb 17 14:17:40 2024 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Sat, 17 Feb 2024 21:17:40 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sat Feb 17 15:17:56 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 17 Feb 2024 13:17:56 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: Message-ID: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> If someone has made a font or if someone is using a substitute Unicode character, that would amount to evidence of the attempt to use the symbol in (digital) text. If actual examples of use of such substitutes in context can be found, it would suggest the type of use. The comparison with the dingbats is tricky. Yes, the whole set is a legacy set, so actual instances can be found in digitally prepared documents and a value is attached to being able to express that in Unicode plain text. However, some symbols, like the check mark, are used in ways that might be similar to the way the approval mark might be used. For example, it can also convey approval and is used in an emojified presentation for that purpose. Being able to express approval with a culturally appropriate icon in this manner is potentially an argument in favor. The details of a proposal, the documentation of actual use, and a clear exposition of how this symbol has iconic value all would influence an eventual decision. A./ From freek at macfreek.nl Sat Feb 17 17:26:59 2024 From: freek at macfreek.nl (Freek Dijkstra) Date: Sun, 18 Feb 2024 00:26:59 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> References: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> Message-ID: Hi Asmus and others, Let me answer a few questions, and at the same time pose some more questions :) /Asmus Freytag wrote:/ > If placing the mark in a text environment where emoji would normally > be used, would it be seen and understood as "approved" in Dutch > culture? Would anyone use it that way? Here is an example use as part of an older logo used by the organisation (VVN) that performed mandatory safety inspections for vehicles: https://upload.wikimedia.org/wikipedia/commons/4/4e/Goedkeuringskrul_VVN.jpg /Asmus Freytag wrote:/ > If someone has made a font or if someone is using a substitute Unicode > character, that would amount to evidence of the attempt to use the > symbol in (digital) text. If actual examples of use of such > substitutes in context can be found, it would suggest the type of use. While I'm not aware of any font or substitute Unicode character (except for unicode-krul.nl, but that's not an independant source), here is a Q&A on StackExchange with a few dozen people to get the symbol in an electronic document after all: https://tex.stackexchange.com/questions/313281/how-to-make-a-krul-unofficial-dutch-symbol-for-ok @James Kass, Christoph P?per: I've also read about the use of the Pfennig symbol or the deleatur as substitution. However, both the glyph and the meaning are distinctly different. In the last answer of that SE Q&A you'll see an attempt to make it fit nevertheless by hiding part of the glyph ? poorly, if I may add. That said, the SE Q&A does raise a few more serious questions. 1. Would the above be sufficient for the UTC to show proof of need to use in electronic form? On one hand, I think is anecdotal evidence, on the other hand, it is real usage. A few decades ago, I participated in a standardization body where "running code and rough consensus" was the motto. I'm yet unfamiliar with the mores of the Unicode UTC. If the above is not sufficient, what would? A statement from a formal linguistic body? Or from a linguistic user group? 2. The Q&A correctly mentions that this character has two distinct glyphs. While I have a personal preference (just because of the way I was thought to write it), I rather consult a expert linguistic about this. It is said to be around since somewhere in the 19th century, and I do not know how it has changed over the decades, or usage in different regions of the world (beside the Netherlands, it is also used in countries that are former Dutch colonies). /Asmus Freytag wrote:/ > However, some symbols, like the check mark, are used in ways that > might be similar to the way the approval mark might be used. For > example, it can also convey approval and is used in an emojified > presentation for that purpose. 3. Yes. It can convey "approval" but can also mean "incorrect" in Sweden according to https://en.wikipedia.org/wiki/Check_mark#International_differences. And this actually seems to indicate that there are more symbols missing. On that page, the ?/? symbol in Finland is missing from Unicode and Wikipedia uses an image instead (oh, horror), and the hanamaru listed on https://en.wikipedia.org/wiki/O_mark specifically lists a work-around because Unicode is missing that symbols too (last line in the "Unicode" paragraph). I almost get the feeling that Unicode has overlooked a (small) category of these symbols, and only included the English ones. Sadly, my knowledge of those other symbols is limited, so I can only make a proposal for the Flourish of Approval. But just to check: Unicode codepoints represent a glyph, not a meaning, right? So the English ? and Swedish ? have the same codepoint, even though their meaning is different? Side note: the check mark seems to come from the letter "v" for "vidit" ("has seen") according to a professor in a Dutch paper, just like the glyph for the Flourish of Approval likely comes from the letter "g", from "goed" ("good") or "gezien" ("seen"). 4. The discussion on character vs emoji, and the legacy set of symbols in the U+2700 table (Dingbats) does raise the question: where should a new symbol be placed? It is a symbol, but the miscellaneous symbols in the U+2700 table (Dingbats) are currently listed under "Emoji & Pictograms". However, this is not a pictogram -- while not a character in an alphabet (which has ordering), it is also not a pictogram (it does not represent a physical object). So looking at https://www.unicode.org/charts/, where should this symbol be placed? With kind regards, Freek -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sat Feb 17 19:32:46 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 17 Feb 2024 17:32:46 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> Message-ID: <7ae9b049-f7fb-4060-9d25-3273ede52dbe@ix.netcom.com> Remember, this list is just an informal discussions that might give you ideas on how to argue the case for encoding and what likely objections you may encounter. It otherwise carries no weight and while it's archived, it's not something anyone would turn to in making decisions. That said. The cited discussion on SE shows that that there are reasonable scenarios where this is used as a symbol/punctuation in text. That it would also be "letter-like", that is, derived from a letter shape, makes a case for encoding this as a symbol with text representation. The standalone use on logos makes me wonder whether, should it be available, Dutch users would use it as an emoji (e.g. in text messages). It can easily be argued from the evidence already shared, that (1) Dutch users would readily recognize it (2) there's a desire to not only have it in text, but also, at times to have it stand out and act as a full statement of its own, very analogous to a check mark with emoji presentation. I would counsel to not view this as an either / or. Perhaps persuing this as a standard (text presentation) symbol at first, and then later explore whether it falls in the small range of iconic symbols that exist in both text and emoji form -- with the check mark being the obvious analog. The evidence presented in form of the safety inspection sticker makes the case that this symbol has acquired a generalized use that is not limited to marking student papers. That may have been the origin, but it should not limit UTC in taking into account its apparently much broader use. While the solution presented in the context of the TeX SE works well for TeX / LaTeX, it doesn't work in general typesetting. This would not be the first time that Unicode encodes a symbol that (instead of a PUA font) has first been created as a special TeX macro. That would be useful to point out. Having a macro that creates an outline on the fly is very different from placing a bitmap or other picture in running text. It definitely has parallels to creating outlines that you access with a PUA code - except that the detour via PUA isn't needed in TeX because TeX natively supports named (user defined) macros. A./ On 2/17/2024 3:26 PM, Freek Dijkstra via Unicode wrote: > Hi Asmus and others, > > Let me answer a few questions, and at the same time pose some more > questions :) > > /Asmus Freytag wrote:/ >> If placing the mark in a text environment where emoji would normally >> be used, would it be seen and understood as "approved" in Dutch >> culture? Would anyone use it that way? > Here is an example use as part of an older logo used by the > organisation (VVN) that performed mandatory safety inspections for > vehicles: > https://upload.wikimedia.org/wikipedia/commons/4/4e/Goedkeuringskrul_VVN.jpg > > /Asmus Freytag wrote:/ >> If someone has made a font or if someone is using a substitute >> Unicode character, that would amount to evidence of the attempt to >> use the symbol in (digital) text. If actual examples of use of such >> substitutes in context can be found, it would suggest the type of use. > While I'm not aware of any font or substitute Unicode character > (except for unicode-krul.nl, but that's not an independant source), > here is a Q&A on StackExchange with a few dozen people to get the > symbol in an electronic document after all: > https://tex.stackexchange.com/questions/313281/how-to-make-a-krul-unofficial-dutch-symbol-for-ok > > @James Kass, Christoph P?per: > I've also read about the use of the Pfennig symbol or the deleatur as > substitution. However, both the glyph and the meaning are distinctly > different. In the last answer of that SE Q&A you'll see an attempt to > make it fit nevertheless by hiding part of the glyph ? poorly, if I > may add. > > > That said, the SE Q&A does raise a few more serious questions. > > 1. Would the above be sufficient for the UTC to show proof of need to > use in electronic form? On one hand, I think is anecdotal evidence, on > the other hand, it is real usage. A few decades ago, I participated in > a standardization body where "running code and rough consensus" was > the motto. I'm yet unfamiliar with the mores of the Unicode UTC. If > the above is not sufficient, what would? A statement from a formal > linguistic body? Or from a linguistic user group? > > 2. The Q&A correctly mentions that this character has two distinct > glyphs. While I have a personal preference (just because of the way I > was thought to write it), I rather consult a expert linguistic about > this. It is said to be around since somewhere in the 19th century, and > I do not know how it has changed over the decades, or usage in > different regions of the world (beside the Netherlands, it is also > used in countries that are former Dutch colonies). > > /Asmus Freytag wrote:/ >> However, some symbols, like the check mark, are used in ways that >> might be similar to the way the approval mark might be used. For >> example, it can also convey approval and is used in an emojified >> presentation for that purpose. > 3. Yes. It can convey "approval" but can also mean "incorrect" in > Sweden according to > https://en.wikipedia.org/wiki/Check_mark#International_differences. > And this actually seems to indicate that there are more symbols > missing. On that page, the ?/? symbol in Finland is missing from > Unicode and Wikipedia uses an image instead (oh, horror), and the > hanamaru listed on https://en.wikipedia.org/wiki/O_mark specifically > lists a work-around because Unicode is missing that symbols too (last > line in the "Unicode" paragraph). I almost get the feeling that > Unicode has overlooked a (small) category of these symbols, and only > included the English ones. Sadly, my knowledge of those other symbols > is limited, so I can only make a proposal for the Flourish of > Approval. But just to check: Unicode codepoints represent a glyph, not > a meaning, right? So the English ? and Swedish ? have the same > codepoint, even though their meaning is different? > > Side note: the check mark seems to come from the letter "v" for > "vidit" ("has seen") according to a professor in a Dutch paper, just > like the glyph for the Flourish of Approval likely comes from the > letter "g", from "goed" ("good") or "gezien" ("seen"). > > 4. The discussion on character vs emoji, and the legacy set of symbols > in the U+2700 table (Dingbats) does raise the question: where should a > new symbol be placed? It is a symbol, but the miscellaneous symbols in > the U+2700 table (Dingbats) are currently listed under "Emoji & > Pictograms". However, this is not a pictogram -- while not a character > in an alphabet (which has ordering), it is also not a pictogram (it > does not represent a physical object). So looking at > https://www.unicode.org/charts/, where should this symbol be placed? > > > With kind regards, > Freek > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Sat Feb 17 19:35:06 2024 From: jameskass at code2001.com (James Kass) Date: Sun, 18 Feb 2024 01:35:06 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> Message-ID: On 2024-02-17 11:26 PM, Freek Dijkstra via Unicode wrote: > I almost get the feeling that Unicode has overlooked a (small) > category of these symbols, and only included the English ones. Sadly, > my knowledge of those other symbols is limited, so I can only make a > proposal for the Flourish of Approval. But just to check: Unicode > codepoints represent a glyph, not a meaning, right? So the English ? > and Swedish ? have the same codepoint, even though their meaning is > different? Unicode encodes characters rather than glyphs.? Please see http://www.unicode.org/reports/tr17/tr17-3.html for more information, specifically section 2.1 for illustrations.? The check mark (?) has one code point because of convention:? there was no distinction between Swedish and English usage of the mark in pre-existing character sets. The Unicode repertoire might be perceived as favoring English symbols, but we need to keep in mind that the original goal of Unicode was to standardize existing character sets into a universal encoding which would serve everyone.? Many of those existing character sets were developed by English speaking users, hence the possible appearance of favoritism.? Likewise, an even larger batch of those existing character sets were developed by ?Westerners?, which can give the appearance of favoritism to non-Western users. But over time, many non-English and non-Western characters have been added to the Unicode repertoire because somebody took the time and made the effort to submit an encoding proposal and escort it through the approval process. From asmusf at ix.netcom.com Sat Feb 17 19:52:40 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 17 Feb 2024 17:52:40 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> Message-ID: On 2/17/2024 5:35 PM, James Kass via Unicode wrote: > > > On 2024-02-17 11:26 PM, Freek Dijkstra via Unicode wrote: >> I almost get the feeling that Unicode has overlooked a (small) >> category of these symbols, and only included the English ones. Sadly, >> my knowledge of those other symbols is limited, so I can only make a >> proposal for the Flourish of Approval. But just to check: Unicode >> codepoints represent a glyph, not a meaning, right? So the English ? >> and Swedish ? have the same codepoint, even though their meaning is >> different? > > Unicode encodes characters rather than glyphs.? Please see > http://www.unicode.org/reports/tr17/tr17-3.html for more information, > specifically section 2.1 for illustrations.? The check mark (?) has > one code point because of convention:? there was no distinction > between Swedish and English usage of the mark in pre-existing > character sets. The exception might be where some local convention uses both a check mark and some other shape in alternation. In such cases, there may be an argument in favor of considering the other shape a different symbol instead of implausibly suggesting that the check mark now has a range of acceptable glyph variations that includes the other shape (which would come as a surprise to most users of the existing check mark ...). > > The Unicode repertoire might be perceived as favoring English symbols, > but we need to keep in mind that the original goal of Unicode was to > standardize existing character sets into a universal encoding which > would serve everyone.? Many of those existing character sets were > developed by English speaking users, hence the possible appearance of > favoritism.? Likewise, an even larger batch of those existing > character sets were developed by ?Westerners?, which can give the > appearance of favoritism to non-Western users. But over time, many > non-English and non-Western characters have been added to the Unicode > repertoire because somebody took the time and made the effort to > submit an encoding proposal and escort it through the approval process. > I agree, there's every reason to identify cases where Unicode lacks a way for expressing a local written convention, even outside standard orthographic writing. We definitely should not - as a matter of principle - rule out local equivalents to widely used marks, just because the others are either used in English or have become global. The symbol discussed here is in much more wide-spread and active use than many of the dead alphabets being added; even if it never becomes popular outside the Netherlands. A./ From steffen at sdaoden.eu Sat Feb 17 17:34:22 2024 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Sun, 18 Feb 2024 00:34:22 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: Message-ID: <20240217233422.nbfws3QQ@steffen%sdaoden.eu> Doug Ewell via Unicode wrote in : That made me think (my local copy is from 2020, i do not recall anything), are "protective signs" part of Unicode already? Some are very hard, almost impossible i'd say, for fonts. But they are very important pictographics. *Very*. https://en.wikipedia.org/wiki/Protective_sign https://de.wikipedia.org/wiki/Schutzzeichen https://de.wikipedia.org/wiki/Barbarastollen_(Freiburg_im_Breisgau) --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From ecm.unicode at gmail.com Sat Feb 17 21:17:15 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Sat, 17 Feb 2024 22:17:15 -0500 Subject: Protective signs In-Reply-To: <20240217233422.nbfws3QQ@steffen%sdaoden.eu> References: <20240217233422.nbfws3QQ@steffen%sdaoden.eu> Message-ID: On Sat, Feb 17, 2024 at 9:30?PM Steffen Nurpmeso via Unicode < unicode at corp.unicode.org> wrote: > That made me think (my local copy is from 2020, i do not recall > anything), are "protective signs" part of Unicode already? > Some are very hard, almost impossible i'd say, for fonts. > But they are very important pictographics. *Very*. > > https://en.wikipedia.org/wiki/Protective_sign > https://de.wikipedia.org/wiki/Schutzzeichen > > https://de.wikipedia.org/wiki/Barbarastollen_(Freiburg_im_Breisgau) Some of the signs are in Unicode: the letters P and G (U+0050, U+0047), the letters P and W (U+0050 again, U+0057), the letters I and C (U+0049, U+0043), a white flag (U+2690 ???) and a waving white flag (U+1F3F3 ???), a flag of the United Nations incorporating its emblem (the regional?indicator sequence U+1F1FA, U+1F1F3 ????), the letters U and N (U+0055, U+004E), and the thrice?repeatable large orange circle (U+1F7E0 ???). Those that are not in the Unicode repertoire are dependent on color and therefore suggest emoji, if ever they should be encoded. Important as they may be, is there a plaintext use case, such as texting an enemy to indicate a hospital? (Note there is also U+1F3E5 HOSPITAL ???, which in the font I?m working in incorporates a symbol similar to the red cross?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Sat Feb 17 22:22:17 2024 From: jameskass at code2001.com (James Kass) Date: Sun, 18 Feb 2024 04:22:17 +0000 Subject: Protective signs In-Reply-To: References: <20240217233422.nbfws3QQ@steffen%sdaoden.eu> Message-ID: <11db0080-c022-402d-a4a5-d6961a339789@code2001.com> On 2024-02-18 3:17 AM, Erik Carvalhal Miller via Unicode wrote: > Important as they may be, is there a plaintext use case, such as > texting an enemy to indicate a hospital? Wouldn't work if the enemy has our texts blocked.? But if the enemy was a terrorist organization looking for sensitive targets, they'd probably be happy to have us point one out. Seriously, though, also wondering if any plain-text encoding requirement exists for those symbols which aren't already available. From ecm.unicode at gmail.com Sat Feb 17 23:11:01 2024 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Sun, 18 Feb 2024 00:11:01 -0500 Subject: Protective signs In-Reply-To: <11db0080-c022-402d-a4a5-d6961a339789@code2001.com> References: <20240217233422.nbfws3QQ@steffen%sdaoden.eu> <11db0080-c022-402d-a4a5-d6961a339789@code2001.com> Message-ID: On Sat, Feb 17, 2024 at 11:25?PM James Kass via Unicode < unicode at corp.unicode.org> wrote: > On 2024-02-18 3:17 AM, Erik Carvalhal Miller via Unicode wrote: > > Important as they may be, is there a plaintext use case, such as > > texting an enemy to indicate a hospital? > > Wouldn't work if the enemy has our texts blocked. > Unfriended! ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sun Feb 18 02:18:20 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 18 Feb 2024 00:18:20 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <329f4c0f-c6bc-4d52-8b89-2b5cb6cc9204@ix.netcom.com> Message-ID: <0615d66d-44b2-4104-a9a9-1178808906cc@ix.netcom.com> On 2/17/2024 12:02 PM, Christoph P?per via Unicode wrote: > Asmus Freytag via Unicode: >> We usually don't encode characters intended for use in handwriting, except if they are needed to digitally archive manuscripts. Not sure grade school papers pass that bar. > Every piece of writing might be digitally archived nowadays, even more so in the future. Therefore, every _established_ literal atomic sign should be encodable, so it can be unambiguously read by machines. I strongly believe this includes paralinguistic signs, whereas nonlinguistic signs (e.g. much of ISO 7000) would require an extension of the scope of Unicode (although several graphic symbols from that and other standards already have a codepoint assigned to them). > > This one is clearly well established, i.e. has at least one canonical form and meaning, even if its use is geographically limited. It cannot be represented by a combination of other, already encoded characters. > That's an argument a proposal could make, but I'm not sure I'm ready to agree with that analysis. Even if we approach 100% digital archiving, not everything can be, will be or needs to be archived as *plain text*. (Or even rich text). Manuscripts are a good example of handwritten text that benefits from conversion to digital text, because they are subject of intense scholarship that would benefit from having the usual array of digital text processing available, such as search, and convenient rendering of excerpts. People are studying the marks accompanying cave paintings, such as lines, circles or dots. One even resembles a hash mark #, making that arguably the oldest uniquely recognizable symbol ever encoded as a character. (Aside: dots and lines don't count, because we encode many different dots and lines). For those studies, there's no overriding need to place the symbols into running text, or to attempt to show sequences of them as plain text. Therefore, such use alone is not sufficient rationale for deciding the question what constitutes an abstract character and to provide a standardized encoding, plus assign properties such as line breaking behavior. The Dutch mark in question is interesting in that it's clearly associated with a well-defined concept and has a recognizable (and conventional) shape. Neither of those two aspects present any obstacle to encoding. However, the need to represent it in plain text needs to be established and any successful proposal will have to provide an argument that is specific and to the point. The mere claim of a general principle as suggested above is not sufficient to make a persuasive argument for a specific encoding. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at lindenbergsoftware.com Sun Feb 18 04:05:13 2024 From: unicode at lindenbergsoftware.com (Norbert Lindenberg) Date: Sun, 18 Feb 2024 11:05:13 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <8fbef790-93aa-4abd-bd27-5351177f9532@code2001.com> References: <24be4921-eeed-4e76-9869-1df9cb08e0af@macfreek.nl> <8fbef790-93aa-4abd-bd27-5351177f9532@code2001.com> Message-ID: <1D2BEB85-1FF7-4CF1-8347-DE6C371B2FB3@lindenbergsoftware.com> > On Feb 16, 2024, at 17:11, James Kass via Unicode wrote: > > On 2024-02-16 3:50 PM, Stephane Bortzmeyer via Unicode wrote: >> On Fri, Feb 16, 2024 at 12:27:20PM +0100, >> Freek Dijkstra via Unicode wrote >> a message of 188 lines which said: >> >>> 1. What is the process for submitting assigning a codepoint to a symbol >>> currently missing from the Unicode tables? >> http://unicode.org/emoji/proposals.html >> > If the symbol is not an emoji: > https://www.unicode.org/pending/symbol-guidelines.html > > Submitting character proposals: > http://www.unicode.org/pending/proposals.html Proposals for characters other than emoji and Han are reviewed by the Script Ad Hoc, so this page tells you more about the process: https://www.unicode.org/consortium/scriptadhoc.html That page also links to a template for proposing the encoding of a new character: https://www.unicode.org/L2/L2023/23104r-addl-script-template-april2023.pdf > On Feb 18, 2024, at 00:26, Freek Dijkstra via Unicode wrote: > > So looking at https://www.unicode.org/charts/, where should this symbol be placed? Don?t worry about that; the SAH can find a code point for your character (see page 2 of the template). Best regards, Norbert From marius.spix at web.de Sun Feb 18 11:25:29 2024 From: marius.spix at web.de (Marius Spix) Date: Sun, 18 Feb 2024 18:25:29 +0100 Subject: Aw: Re: Protective signs In-Reply-To: References: <20240217233422.nbfws3QQ@steffen%sdaoden.eu> Message-ID: Unicode also has ?? HELMET WITH WHITE CROSS (U+26D1) which also could be used to mark medical corps. However, the actual orign of that character is the Maintenance symbol in Japanese TV broadcast. And similar to ? STAFF OF AESCULAPIS (U+2695) it is no international protective sign. That is the reason, why medical corps wear a white armband or patch with a red cross, crescent or crystal. While the red lion with sun is also theoretically protected, it is never used. > Gesendet: Sonntag, den 18.02.2024 um 04:17 Uhr > Von: "Erik Carvalhal Miller via Unicode" > An: "Steffen Nurpmeso" > Cc: "Doug Ewell via Unicode" , "Freek Dijkstra" > Betreff: Re: Protective signs > > On Sat, Feb 17, 2024 at 9:30?PM Steffen Nurpmeso via Unicode < > unicode at corp.unicode.org> wrote: > > > That made me think (my local copy is from 2020, i do not recall > > anything), are "protective signs" part of Unicode already? > > Some are very hard, almost impossible i'd say, for fonts. > > But they are very important pictographics. *Very*. > > > > https://en.wikipedia.org/wiki/Protective_sign > > https://de.wikipedia.org/wiki/Schutzzeichen > > > > https://de.wikipedia.org/wiki/Barbarastollen_(Freiburg_im_Breisgau) > > > Some of the signs are in Unicode: the letters P and G (U+0050, U+0047), the > letters P and W (U+0050 again, U+0057), the letters I and C (U+0049, > U+0043), a white flag (U+2690 ???) and a waving white flag (U+1F3F3 ???), > a flag of the United Nations incorporating its emblem (the > regional?indicator sequence U+1F1FA, U+1F1F3 ????), the letters U and N > (U+0055, U+004E), and the thrice?repeatable large orange circle > (U+1F7E0 ???). Those that are not in the Unicode repertoire are dependent > on color and therefore suggest emoji, if ever they should be encoded. > Important as they may be, is there a plaintext use case, such as texting an > enemy to indicate a hospital? (Note there is also U+1F3E5 HOSPITAL ???, > which in the font I?m working in incorporates a symbol similar to the red > cross?) From wjgo_10009 at btinternet.com Mon Feb 19 06:53:40 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 19 Feb 2024 12:53:40 +0000 (GMT) Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: Message-ID: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> I wonder if the encoding rules are no longer fit for purpose. ? The encoding process should be to be helpful to consumers, not to lead to an agreement to restrict progress. ? I get the impression - and if I have got it wrong please correct me - that if one were using the krul character in a desktop publishing program that the likely scenario is that there is a large rectangular text frame filling most of the page and containing text in the Dutch language, in, say, 14 point, and there is in the right margin, near the lower edge of the page, a small rectangular text frame into which the krul character is inserted, quite possibly at a larger size than the other text, at, say, 36 point or 48 point. ? Thus the krul character is not within a line of running text involving other characters as well as itself. ? I say that the fact that the krul character is not within a line of running text involving other characters as well as itself should not go against the encoding of the krul character as a regular Unicode character. ? This is because, in practice an end user is likely to want to introduce the krul character from a font. So encoding the krul character in regular Unicode would be helpful to end users and in my opinion being helpful to end users and consumers is what is important in encoding decisions. ? William Overington ? Monday 19 February 2024 ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cate at cateee.net Mon Feb 19 09:44:58 2024 From: cate at cateee.net (Giacomo Catenazzi) Date: Mon, 19 Feb 2024 16:44:58 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> References: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> Message-ID: <62a4f367-6b63-49c2-afd2-4897775f0305@cateee.net> On 19 Feb 2024 13:53, William_J_G Overington via Unicode wrote: > > This is because, in practice an end user is likely to want to introduce > the krul character from a font. So encoding the krul character in > regular Unicode would be helpful to end users and in my opinion being > helpful to end users and consumers is what is important in encoding > decisions. I agree, but I would not formulate on such generic way. It must be useful in practice, not just potentially useful. By being in Unicode standard doesn't make any symbol useful to users *per se*, as we see in many technical symbols: they are in Unicode, but impossible to use because nobody do a good font (or any font). IMHO we lack of volunteers (or money). Now it seems it is mostly on SIL and on Google (Noto font), but they still need to implement a lot of missing symbols and also scripts). This particular case may be simpler: there is no lack of people which understand the character and the glyph (and no strange script rules), but we should be careful not to go much behind, and so telling browsers and publishing programs to just start ignoring *second class* characters. So we should weight more parameters, so that user will get something useful for real. (Note: with time things will improve). giacomo From pgcon6 at msn.com Thu Feb 22 13:07:44 2024 From: pgcon6 at msn.com (Peter Constable) Date: Thu, 22 Feb 2024 19:07:44 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> References: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> Message-ID: > in practice an end user is likely to want to introduce the krul character from a font. So encoding the krul character in regular Unicode would be helpful to end users and in my opinion being helpful to end users and consumers is what is important in encoding decisions. By this line of reasoning, every icon in any symbol font, such as Font Awesome would be a candidate for encoding. UTC has already explicitly decided against that argument for encoding. Moreover, the successful, widespread use of fonts like Font Awesome clearly demonstrates that encoding in Unicode is not necessary for users to easily use graphic symbols in content. The Unicode Standard encodes characters, where ?character? is understood to mean an element of textual content and the encoding is intended for purposes of text processing. Not every graphic element qualifies for encoding simply because it can be presented using a font and placed in a text frame of a DTP application. Cf. https://www.unicode.org/versions/Unicode15.0.0/ch01.pdf Peter From: Unicode On Behalf Of William_J_G Overington via Unicode Sent: Monday, February 19, 2024 5:54 AM To: unicode at corp.unicode.org Subject: Re: What's the process for proposing a symbol in the Unicode table? I wonder if the encoding rules are no longer fit for purpose. The encoding process should be to be helpful to consumers, not to lead to an agreement to restrict progress. I get the impression - and if I have got it wrong please correct me - that if one were using the krul character in a desktop publishing program that the likely scenario is that there is a large rectangular text frame filling most of the page and containing text in the Dutch language, in, say, 14 point, and there is in the right margin, near the lower edge of the page, a small rectangular text frame into which the krul character is inserted, quite possibly at a larger size than the other text, at, say, 36 point or 48 point. Thus the krul character is not within a line of running text involving other characters as well as itself. I say that the fact that the krul character is not within a line of running text involving other characters as well as itself should not go against the encoding of the krul character as a regular Unicode character. This is because, in practice an end user is likely to want to introduce the krul character from a font. So encoding the krul character in regular Unicode would be helpful to end users and in my opinion being helpful to end users and consumers is what is important in encoding decisions. William Overington Monday 19 February 2024 -------------- next part -------------- An HTML attachment was scrubbed... URL: From freek at macfreek.nl Thu Feb 22 17:11:29 2024 From: freek at macfreek.nl (Freek Dijkstra) Date: Fri, 23 Feb 2024 00:11:29 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> Message-ID: Hi Peter, Thanks for your references. However, I'm a bit confused with your argument. Are you talking about the krul symbol or about icons in general in the discussion with William? I can't find the word "icon" in the referred chapter 1 of Unicode 15.0, so I assume you refer to this text in the document: > Note, however, that the Unicode Standard does not encode > idiosyncratic, personal, novel, or private-use characters, nor does it > encode logos or graphics. In case you refer to the "krul" character I want to propose: that is neither an icon nor a personal or private-use character, nor a logo, nor a graphics. At least not in the sence that it is not a graphical representation of a physical object (like all examples I see on the home page of https://fontawesome.com/icons). If your argument is referring to the general use case, my apologies. I do not have any opinion about that. With kind regards, Freek Dijkstra On 22-02-2024 20:07, Peter Constable via Unicode wrote: > > > in practice an end user is likely to want to introduce the krul > character from a font. So encoding the krul character in regular > Unicode would be helpful to end users and in my opinion being helpful > to end users and consumers is what is important in encoding decisions. > > By this line of reasoning, every icon in any symbol font, such as Font > Awesome would be a candidate for > encoding. UTC has already explicitly decided against that argument for > encoding. Moreover, the successful, widespread use of fonts like Font > Awesome clearly demonstrates that encoding in Unicode is not necessary > for users to easily use graphic symbols in content. > > The Unicode Standard encodes characters, where ?character? is > understood to mean an element of textual content and the encoding is > intended for purposes of text processing. Not every graphic element > qualifies for encoding simply because it can be presented using a font > and placed in a text frame of a DTP application. > > Cf. https://www.unicode.org/versions/Unicode15.0.0/ch01.pdf > > Peter > > *From:*Unicode *On Behalf Of > *William_J_G Overington via Unicode > *Sent:* Monday, February 19, 2024 5:54 AM > *To:* unicode at corp.unicode.org > *Subject:* Re: What's the process for proposing a symbol in the > Unicode table? > > I wonder if the encoding rules are no longer fit for purpose. > > The encoding process should be to be helpful to consumers, not to lead > to an agreement to restrict progress. > > I get the impression - and if I have got it wrong please correct me - > that if one were using the krul character in a desktop publishing > program that the likely scenario is that there is a large rectangular > text frame filling most of the page and containing text in the Dutch > language, in, say, 14 point, and there is in the right margin, near > the lower edge of the page, a small rectangular text frame into which > the krul character is inserted, quite possibly at a larger size than > the other text, at, say, 36 point or 48 point. > > Thus the krul character is not within a line of running text involving > other characters as well as itself. > > I say that the fact that the krul character is not within a line of > running text involving other characters as well as itself should not > go against the encoding of the krul character as a regular Unicode > character. > > This is because, in practice an end user is likely to want to > introduce the krul character from a font. So encoding the krul > character in regular Unicode would be helpful to end users and in my > opinion being helpful to end users and consumers is what is important > in encoding decisions. > > William Overington > > Monday 19 February 2024 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Feb 22 19:08:46 2024 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 22 Feb 2024 17:08:46 -0800 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> Message-ID: <8145e1b0-17a7-405e-af1c-715f01547e2b@ix.netcom.com> On 2/22/2024 3:11 PM, Freek Dijkstra via Unicode wrote: > In case you refer to the "krul" character I want to propose: that is > neither an icon nor a personal or private-use character, nor a logo, > nor a graphics. At least not in the sence that it is not a graphical > representation of a physical object (like all examples I see on the > home page of https://fontawesome.com/icons). > > If your argument is referring to the general use case, my apologies. I > do not have any opinion about that. The way I read the discussion on the list, it had descended into general arguments. For the specific character, what's needed now is submission of a well-formed proposal. The return on further discussion of this character on this list is probably insignificant. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Thu Feb 22 20:31:18 2024 From: pgcon6 at msn.com (Peter Constable) Date: Fri, 23 Feb 2024 02:31:18 +0000 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <407348d9.2140.18dc16de220.Webtop.127@btinternet.com> Message-ID: Hi, Freek I was responding to a general principle being put forth by William. The only concern I was expressing in regards to krul was the application of William's principle as an argument for encoding krul. If it's clear that a character is an element of text content with an active user community, then existing font implementations can contribute to a proposal for encoding. But the rational he was suggesting implied that any graphic symbol that users might want to place on a page warranted encoding so that the symbol can be implemented in fonts. UTC will not buy that. Others have given useful suggestions for what might provide helpful evidence in an encoding proposal. The mention of users finding workarounds to display something _similar_ in text brought to mind the proposal for encoding the Bitcoin currency symbol: UTC found helpful evidence showing that users were interchanging text using characters that were similar to the currency symbol, enough that the intended meaning might be understood _in context_, but not the same and misinterpreted when not in context. Peter From: Freek Dijkstra Sent: Thursday, February 22, 2024 4:11 PM To: Peter Constable ; William_J_G Overington ; unicode at corp.unicode.org Subject: Re: What's the process for proposing a symbol in the Unicode table? Hi Peter, Thanks for your references. However, I'm a bit confused with your argument. Are you talking about the krul symbol or about icons in general in the discussion with William? I can't find the word "icon" in the referred chapter 1 of Unicode 15.0, so I assume you refer to this text in the document: Note, however, that the Unicode Standard does not encode idiosyncratic, personal, novel, or private-use characters, nor does it encode logos or graphics. In case you refer to the "krul" character I want to propose: that is neither an icon nor a personal or private-use character, nor a logo, nor a graphics. At least not in the sence that it is not a graphical representation of a physical object (like all examples I see on the home page of https://fontawesome.com/icons). If your argument is referring to the general use case, my apologies. I do not have any opinion about that. With kind regards, Freek Dijkstra On 22-02-2024 20:07, Peter Constable via Unicode wrote: > in practice an end user is likely to want to introduce the krul character from a font. So encoding the krul character in regular Unicode would be helpful to end users and in my opinion being helpful to end users and consumers is what is important in encoding decisions. By this line of reasoning, every icon in any symbol font, such as Font Awesome would be a candidate for encoding. UTC has already explicitly decided against that argument for encoding. Moreover, the successful, widespread use of fonts like Font Awesome clearly demonstrates that encoding in Unicode is not necessary for users to easily use graphic symbols in content. The Unicode Standard encodes characters, where "character" is understood to mean an element of textual content and the encoding is intended for purposes of text processing. Not every graphic element qualifies for encoding simply because it can be presented using a font and placed in a text frame of a DTP application. Cf. https://www.unicode.org/versions/Unicode15.0.0/ch01.pdf Peter From: Unicode On Behalf Of William_J_G Overington via Unicode Sent: Monday, February 19, 2024 5:54 AM To: unicode at corp.unicode.org Subject: Re: What's the process for proposing a symbol in the Unicode table? I wonder if the encoding rules are no longer fit for purpose. The encoding process should be to be helpful to consumers, not to lead to an agreement to restrict progress. I get the impression - and if I have got it wrong please correct me - that if one were using the krul character in a desktop publishing program that the likely scenario is that there is a large rectangular text frame filling most of the page and containing text in the Dutch language, in, say, 14 point, and there is in the right margin, near the lower edge of the page, a small rectangular text frame into which the krul character is inserted, quite possibly at a larger size than the other text, at, say, 36 point or 48 point. Thus the krul character is not within a line of running text involving other characters as well as itself. I say that the fact that the krul character is not within a line of running text involving other characters as well as itself should not go against the encoding of the krul character as a regular Unicode character. This is because, in practice an end user is likely to want to introduce the krul character from a font. So encoding the krul character in regular Unicode would be helpful to end users and in my opinion being helpful to end users and consumers is what is important in encoding decisions. William Overington Monday 19 February 2024 -------------- next part -------------- An HTML attachment was scrubbed... URL: From freek at macfreek.nl Fri Feb 23 10:33:28 2024 From: freek at macfreek.nl (Freek Dijkstra) Date: Fri, 23 Feb 2024 17:33:28 +0100 Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: <259fc498-fba3-4d28-903b-e269ff59911f@ix.netcom.com> Message-ID: <5401f3a9-6305-48d4-b3b9-2882f3abc79e@macfreek.nl> Hi all, A small status update: I got in touch with the webmaster of https://unicode-krul.nl/en, which was an effort in 2018 (not 2022 as I thought). She is not trying to reach two colleagues from back than, who initiated the effort. I'm curious for their reason to abandon it (either due to interest or because it did not qualify). Writing the draft is certainly doable (especially thanks to advices from Asmus Freytag and others), but I rather build some support first by consulting Dutch language groups, perhaps get in touch with a linguistic expert, before submitting. Thanks in particular to Doug Ewell and Norbert Lindenberg for pointing me to the Script Ad-hoc committee. After the draft is ready, and I have consulted some local experts, I will be back focussing on the process, and this committee seems the best place to get started. I'll post a short message at that time, for those curious folks on this list (yes, that's you, if you kept reading till here ;) ). Regards, Freek From wjgo_10009 at btinternet.com Fri Feb 23 11:54:00 2024 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 23 Feb 2024 17:54:00 +0000 (GMT) Subject: What's the process for proposing a symbol in the Unicode table? In-Reply-To: References: Message-ID: <5f6c6a1f.5264.18dd71a498f.Webtop.127@btinternet.com> Asmus Freytag wrote: ? ? > The way I read the discussion on the list, it had descended into > general arguments. ? ? Not a matter of "descended", the title of the thread is general as is the first question in the first post in this thread. ? ? Also, it is not an "argument", it is a discussion. Sort of a round table, not adversatorial. ? ? Peter Constable wrote: ? ? > But the rational he was suggesting implied that any graphic symbol > that users might want to place on a page warranted encoding so that > the symbol can be implemented in fonts. UTC will not buy that. ? ? Well, in fairness, I suppose it does imply that, though that was not my intention in that post. However, I do tend to favour a policy of encoding things that is wider than the policy that is used at present as I opine that that would help progress. ? ? Even if UTC does agree to encode the krul character, it will take some years to become implemented. In the meantime a Private Use Area character could be used, yet that could lead to ambiguity, though possibly not if used in a PDF document and the font,, or a subset of the font, is embedded in the PDF document. ?? ?? If using an OpenType font in an application that has OpenType capability one could set it up so that the glyph of a krul is displayed when a particular sequence of characters is used. For example, if the sequence %k were used for a krul then the glyph for a krul in the font could be named, say, krulglyph and the following added to the liga table of the font. ?? ? sub percent k -> krulglyph; ? ?? I have used that technique for various characters that I have devised. ? ? For example, at one of the Internationalization and Unicode Conferences there was mention of there being no emoji for "I" and "you".? ?? ? I tried to design some language-independent emoji for those two, and some other, personal pronouns. Things I tried just did not seem to work. However, I devised a set of abstract emoji-compatible ?glyphs and I like to think that they form a coherent, elegant, colourful, language-independent set of glyphs for personal pronoun characters. Alas, though, I have been told that the Emoji Subcommittee will not encode abstract emoji. I considered that a Private Use Area encoding was unsuitable due to ambiguity issues in interoperability. ?? ?? So I have devised my own encoding system for them, so "I" is encoded as %11 and "You" as %21 (that is, 2 for second person, 1 for singular). But it is not like having a regular Unicode encoding. I feel that these codes are just not going to get applied very much at all. But there we are.? ?? ?? The bar for getting newly invented characters encoded into regular Unicode is so very high. Is that very high bar reasonable or does it impede progress? Does it mean that only large companies with large resources are able to reach that very high bar? ?? ? For example, newly invented characters that show good potential for being applied and that applying of them resulting in progress could be encoded into regular Unicode as unambiguous sequences (possibly using tag characters) without using any new characters. That would mean that people could use the characters without being concerned about intellectual property rights. ? ? There could be a renaissance of progress. ? ? William Overington ?? ? Friday 23 February 2024 ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From julesbertholet at quoi.xyz Tue Feb 27 11:19:28 2024 From: julesbertholet at quoi.xyz (Jules Bertholet) Date: Tue, 27 Feb 2024 17:19:28 +0000 (UTC) Subject: Should the Yijing symbols be made East Asian Wide? Message-ID: <61c220e7627f2d487ff169d354867060b922d67e.camel@quoi.xyz> UAX 11 (https://www.unicode.org/reports/tr11/#ED7) says of the East_Asian_Width property: > Neutral (Not East Asian): [?] Neutral characters do not occur in legacy East Asian character sets. By extension, they also do not occur in East Asian typography. However, there are several ranges of characters which are assigned a width of Neutral despite originating from, and being primarily used in, East Asian text. - The Yijing symbols: these symbols originate from the Yi Jing (https://en.wikipedia.org/wiki/I_Ching), an ancient Chinese divination text. These are encoded in Unicode in the "Yijing Hexagram Symbols" block (https://www.unicode.org/charts/PDF/U4DC0.pdf), as well as under the "Yijing monogram and digram symbols" and "Yijing trigram symbols" subheadings in the "Miscellaneous Symbols" block (https://www.unicode.org/charts/PDF/U2600.pdf). - The Tai Xuan Jing symbols: these are from another Chinese divination text (https://en.wikipedia.org/wiki/Taixuanjing). Encoded in the block of the same name (https://www.unicode.org/charts/PDF/U1D300.pdf). - The counting rod units and ideographic tally marks: encoded in the "Counting Rod Numerals" block (https://www.unicode.org/charts/PDF/U1D360.pdf), under the respective subheadings. (This block also contains two Western tally marks which should not be East Asian Wide). Given the origin and use of these characters, I believe they should be considered East Asian Wide, not Neutral as currently specified. As additional supporting evidence, glibc currently treats the Yijing hexagrams as wide: https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/unicode- gen/utf8_gen.py;h=e273607b6710811bbbd713fe204100b248d1f7ec;hb=HEAD#l274 Jules Bertholet From markus.icu at gmail.com Tue Feb 27 12:23:41 2024 From: markus.icu at gmail.com (Markus Scherer) Date: Tue, 27 Feb 2024 10:23:41 -0800 Subject: Should the Yijing symbols be made East Asian Wide? In-Reply-To: <61c220e7627f2d487ff169d354867060b922d67e.camel@quoi.xyz> References: <61c220e7627f2d487ff169d354867060b922d67e.camel@quoi.xyz> Message-ID: Hi Jules, I can't answer your question, but wanted to note that this mailing list can be useful for discussion but is not monitored for making changes. When the discussion settles, and if changes are suggested, remember to submit a request via https://www.unicode.org/reporting.html Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: