From unicode at scott.scolby.com Sun Feb 5 18:01:23 2023 From: unicode at scott.scolby.com (Scott Colby) Date: Sun, 05 Feb 2023 19:01:23 -0500 Subject: Question About Emoji Gender Encoding "Sign Format" vs. "Object Format" Message-ID: <8576f35a-261d-4388-aa22-1a73a3b83c8f@app.fastmail.com> Hello, I was recently looking at "Update on Emoji Gender and Skintone Support" (https://www.unicode.org/L2/L2020/20196-gender-skintone-update.pdf) and noticed a statement on page 6 that raised my curiosity: > Note: we cannot use ?sex symbols? to denote GI so we must employ > objects I attempted to track down this requirement, but have been unable to do so. The closest information I found was Technical Report #51, section 2.3.1 (https://www.unicode.org/reports/tr51/#gender-neutral): > [H]uman-form emoji should normally be depicted in a gender-neutral > way unless gender appearance is explicitly specified using an emoji > ZWJ sequence in one of the ways shown in the following table. The table then describes "Sign Format" (the one the note from the first document says cannot be used) and "Object Format." I'm guessing that TR#51 is descriptive of the current state and not prescriptive for future encoding decisions. Is this correct? Can someone point me to the official basis of the prohibition on "Sign Format" for encoding of emojis with explicit genders? Was this a technical decision or was it done for non-technical reasons? Either way, what was the reasoning? I attempted to find an address for Jennifer Daniel or the Emoji Subcommittee (the authors of the document that piqued this question), but was unable to do so. I hope this is the proper forum for this question. If not, I would appreciate direction to the appropriate one. Thank you, Scott Colby From rolandhuse at gmail.com Thu Feb 9 14:31:34 2023 From: rolandhuse at gmail.com (=?UTF-8?B?Um9sYW5kIEjDvHNl?=) Date: Thu, 9 Feb 2023 15:31:34 -0500 Subject: Old Hungarian Alphabet Unicode Message-ID: Dear Peter, Craig and members of the Unicode Consortium, First of all, let me express my gratitude for the possibility of becoming an individual member of the Unicode Consortium. I am honored to be here. My name is Roland H?se, I am a freelance type designer based in Hungary. I am currently studying and researching Old Hungarian (Rovas Script). I am working towards the implementation and modernization of this script by creating my new fonts with this script extension as well as designing matching scripts to existing fonts. However, while creating typefaces I have found out that some of the glyphs from this kind of script are missing. I have made keyboard layout for Mac and I am in the process of making a custom keyboard App for iOS. Therefore I would like to open a topic about the unicode standard of this alphabet, which I was pleased to learn that it has been added to the Unicode Standard not long ago. Would you please let me know if anybody in the group knows about Old Hungarian, or can give me information about how could I discuss this in the future. Furthermore, what are the necessary steps to propose possible solutions to the addressed matter (Missing codepoints). Thank you so much, Kindest regards, Roland H?se https://www.rolandhuse.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Feb 9 16:49:10 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 9 Feb 2023 14:49:10 -0800 Subject: Old Hungarian Alphabet Unicode In-Reply-To: References: Message-ID: <782af493-4155-d737-4160-b5b7f8d28352@ix.netcom.com> On 2/9/2023 12:31 PM, Roland H?se via Unicode wrote: > > Furthermore, what are the necessary?steps to propose possible > solutions to the addressed matter (Missing codepoints). For an answer to that question, please consult the Unicode FAQ. It has a number of helpful answers to questions related to that process. https://www.unicode.org/faq A./ PS: from the main Unicode site look for a link to the Unicode Document register and in there, look for the original documents proposing the additions of Old Hungarian. You may have to go back some years. In there, you might find additional info that didn't make it into the standard as well as potentially the names of people knowledgeable about the script. -------------- next part -------------- An HTML attachment was scrubbed... URL: From beckiergb at gmail.com Thu Feb 9 21:06:12 2023 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Thu, 9 Feb 2023 19:06:12 -0800 Subject: Old Hungarian Alphabet Unicode In-Reply-To: <782af493-4155-d737-4160-b5b7f8d28352@ix.netcom.com> References: <782af493-4155-d737-4160-b5b7f8d28352@ix.netcom.com> Message-ID: On Thu, Feb 9, 2023 at 2:52 PM Asmus Freytag via Unicode < unicode at corp.unicode.org> wrote: > PS: from the main Unicode site look for a link to the Unicode Document > register and in there, look for the original documents proposing the > additions of Old Hungarian. You may have to go back some years. In there, > you might find additional info that didn't make it into the standard as > well as potentially the names of people knowledgeable about the script. > Here are all such documents I could find. For the most part it starts in 2008 and ends in 2013. It looks like the Old Hungarian block had a rather tumultuous time getting into Unicode, with multiple conflicting proposals, a whole ad-hoc group being created, controversy over the name (which appears to continue into the present day), etc. https://www.unicode.org/L2/L1998/98033.pdf https://www.unicode.org/L2/L2008/08268n3483-oldhungarian.pdf https://www.unicode.org/L2/L2008/08353-hungarian-native.pdf https://www.unicode.org/L2/L2008/08354-hungarian-rovas-1-6.pdf https://www.unicode.org/L2/L2008/08355-n3532-oldhungarian-mapping.pdf https://www.unicode.org/L2/L2008/08356-n3531-oldhungarian.pdf https://www.unicode.org/L2/L2009/09059r-n3566r.pdf https://www.unicode.org/L2/L2009/09092-credo.pdf https://www.unicode.org/L2/L2009/09142-n3615-oldhungarian.pdf https://www.unicode.org/L2/L2009/09165-old-hungarian-diffs.pdf https://www.unicode.org/L2/L2009/09168-n3640-hungarian-adhoc.pdf https://www.unicode.org/L2/L2009/09240-n3664-punct.pdf https://www.unicode.org/L2/L2009/09292-hungarian-punct.pdf https://www.unicode.org/L2/L2009/09333-n3697-hungarian-runic.pdf https://www.unicode.org/L2/L2009/09400-close-e.pdf https://www.unicode.org/L2/L2011/11087-szekely.pdf https://www.unicode.org/L2/L2011/11088-carpathian.pdf https://www.unicode.org/L2/L2011/11089-khazarian.pdf https://www.unicode.org/L2/L2011/11165-n4042-hungarian-map.pdf https://www.unicode.org/L2/L2011/11177-hungarian-comp.pdf https://www.unicode.org/L2/L2011/11207-n4055.pdf https://www.unicode.org/L2/L2011/11226-n4080.pdf https://www.unicode.org/L2/L2011/11242r-n4110r-oldhungarian-adhoc.pdf https://www.unicode.org/L2/L2011/11337-hungarian-letter.pdf https://www.unicode.org/L2/L2011/11342-old-hungarian.txt https://www.unicode.org/L2/L2012/12014-n4183-hungarian.pdf https://www.unicode.org/L2/L2012/12036-n4196-oldhungarian-chart.pdf https://www.unicode.org/L2/L2012/12037-n4197-oldhungarian-response.pdf https://www.unicode.org/L2/L2012/12070-n4222.pdf https://www.unicode.org/L2/L2012/12073-n4225.pdf https://www.unicode.org/L2/L2012/12088-rona-tas-letter.pdf https://www.unicode.org/L2/L2012/12089-rovas-fnd.pdf https://www.unicode.org/L2/L2012/12168r-n4268r-oldhungarian.pdf https://www.unicode.org/L2/L2012/12189-n4267.pdf https://www.unicode.org/L2/L2012/12218-hungarian.pdf https://www.unicode.org/L2/L2012/12219-rovas-minutes.pdf https://www.unicode.org/L2/L2012/12331-revised-rovas.pdf https://www.unicode.org/L2/L2012/12332-rovas-script.pdf https://www.unicode.org/L2/L2012/12334-n4374-oldhungarian-adhoc.pdf https://www.unicode.org/L2/L2012/12337-rovas-response.pdf https://www.unicode.org/L2/L2013/13049-std-proc.pdf https://www.unicode.org/L2/L2013/13218-n4492-rovas.pdf https://www.unicode.org/L2/L2021/21115-old-hungarian.pdf https://www.unicode.org/L2/L2021/21246-old-hungarian-fdbk.pdf https://www.unicode.org/L2/L2022/22285-old-hungarian.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From aprilop at fn.de Tue Feb 21 02:20:44 2023 From: aprilop at fn.de (Andreas Prilop) Date: Tue, 21 Feb 2023 08:20:44 +0000 Subject: Zero-Width Joiner U+200D Message-ID: I think that U+FECC medial ain and U+200D U+0639 U+200D ZWJ, ain, ZWJ should look the same, regardless of surrounding text and direction. Is this correct? Some programs (such as Firefox) display them differently; see the attached HTML file. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aprilop at fn.de Tue Feb 21 02:25:13 2023 From: aprilop at fn.de (Andreas Prilop) Date: Tue, 21 Feb 2023 08:25:13 +0000 Subject: Zero-Width Joiner U+200D In-Reply-To: References: Message-ID: See also https://corp.unicode.org/pipermail/unicode/attachments/20230221/4502ca57/attachment-0001.html From asmusf at ix.netcom.com Tue Feb 21 02:46:11 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 21 Feb 2023 00:46:11 -0800 Subject: Zero-Width Joiner U+200D In-Reply-To: References: Message-ID: <7ec9860a-6e17-86aa-9526-8d97382c8a2a@ix.netcom.com> On 2/21/2023 12:20 AM, Andreas Prilop via Unicode wrote: > I think that > > U+FECC > medial ain > and > U+200D U+0639 U+200D > ZWJ, ain, ZWJ > > should look the same, regardless of surrounding text and direction. > Is this correct? > Some programs (such as Firefox) display them differently; > see the attached HTML file. Generally, surrounding a character with ZWJ should lead to it being rendered with its medial form. Text direction should have nothing to do with shaping, and shaping context should not extend across ZWJ. That's my understanding as well, but maybe somebody can throw some light on this issue. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jukkakk at gmail.com Tue Feb 21 03:19:13 2023 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Tue, 21 Feb 2023 11:19:13 +0200 Subject: Zero-Width Joiner U+200D In-Reply-To: References: Message-ID: Andreas Prilop via Unicode (unicode at corp.unicode.org) wrote: I think that > > U+FECC > medial ain > and > U+200D U+0639 U+200D > ZWJ, ain, ZWJ > > should look the same, regardless of surrounding text and direction. > The Standard says at 23.2: ?U+200D zero width joiner is intended to produce a more connected rendering of adjacent characters than would otherwise be the case, if possible. [...] In a sequence like , where a cursive form exists for X but not for Y, the presence of ZWJ requests a cursive form for X. Otherwise, where neither a ligature nor a cursive connection is available, the ZWJ has no effect.? My interpretation of this is that ZWJ should have no effect when it does not appear between two graphic characters. In practice, browsers treat the use of ZWJ at the start or end of a string in various ways. For example, Word shows U+200D U+0639 U+200D as initial-form ain, BabelPad as medial-form. When I use Gmail on Chrome, I get ???, i.e. medial-form, but who knows what it will look like in other environments. Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Tue Feb 21 05:01:53 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 21 Feb 2023 03:01:53 -0800 Subject: Zero-Width Joiner U+200D In-Reply-To: References: Message-ID: <23a469db-fc5e-5037-5da0-01e9e0c13c6a@ix.netcom.com> On 2/21/2023 1:19 AM, Jukka K. Korpela via Unicode wrote: > My interpretation of this is that ZWJ should have no effect when it > does not appear between two graphic characters. Not what I remember as the intent when we introduced the character way back when. Clearly one of the design features was to also support "didactic" formatting, that is, bracketing a character with joiners or non-joiners to be able to show positional forms in isolation. While later compromises forced the acceptance of the compatibility forms, it was never anticipated that there would be a use case for them that couldn't be realized with standard characters. I see where you might come to your interpretation of the text as written, but the cases where you might reasonably need adjacent characters to be able to get the requested effect would be more for things like ligatures where it doesn't make sense to display a single character "as it would appear in a ligature" because, unlike positional forms, that concept is not well defined. (I don't know to what degree in actual layout the positional forms themselves are dependent on knowing the adjacent character, and whether that makes it less straightforward to display a "nominal" positional form, as in a generic "medial ain"). A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jukkakk at gmail.com Tue Feb 21 05:49:14 2023 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Tue, 21 Feb 2023 13:49:14 +0200 Subject: Zero-Width Joiner U+200D In-Reply-To: <23a469db-fc5e-5037-5da0-01e9e0c13c6a@ix.netcom.com> References: <23a469db-fc5e-5037-5da0-01e9e0c13c6a@ix.netcom.com> Message-ID: Asmus Freytag via Unicode (unicode at corp.unicode.org) wrote: Clearly one of the design features was to also support "didactic" > formatting, that is, bracketing a character with joiners or non-joiners to > be able to show positional forms in isolation. > If that was, and is, the intent, then the standard apparently should be amended, adding such descriptions. At present, the description only deals with the effect of ZWJ when it appears between two graphic characters, and it is difficult to avoid the impression that in other contexts it should be ignored. In any case, there is no requirement or recommendation or even suggestion about its effect when used at the start of a string or at the end of a string. It would be simple to add a description like ?surrounding a character with ZWJ should lead to it being rendered with its medial form? (if the character has such a representation form). What would be the way to achieve initial or final form for a character presented in isolation? Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Tue Feb 21 06:03:25 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 21 Feb 2023 04:03:25 -0800 Subject: Zero-Width Joiner U+200D In-Reply-To: References: <23a469db-fc5e-5037-5da0-01e9e0c13c6a@ix.netcom.com> Message-ID: <6f4f18b5-b880-56ee-48a2-99e3d910621c@ix.netcom.com> On 2/21/2023 3:49 AM, Jukka K. Korpela via Unicode wrote: > ?Asmus Freytag via Unicode (unicode at corp.unicode.org) wrote: > > Clearly one of the design features was to also support "didactic" > formatting, that is, bracketing a character with joiners or > non-joiners to be able to show positional forms in isolation. > > > If that was, and is, the intent, then the standard apparently should > be amended, adding such descriptions. At present, the description only > deals with the effect of ZWJ when it appears between two graphic > characters, and it is difficult to avoid the impression that in other > contexts it should be ignored. In any?case, there is no requirement or > recommendation or even suggestion about its effect when used at the > start of a string or at the end of a string. > > It would be simple to add a description?like ?surrounding a character > with ZWJ should lead to it being rendered with its medial form? (if > the character has such a representation form). I think we need to look at whether the language accurately reflects what we were trying to say. I do know that it was revised at one point, when the use of ZWJ was generalized beyond cursive connection. The interpretation you suggest may be an inadvertent result of that change, or someone had found out why the usage that I always understood as intended is for some reason problematic. In that case, it should be excluded more explicitly, in my view. > What would be the way to achieve initial or final form for a character > presented in isolation? > One sided "bracketing" of the character with ZWJ and ZWNJ on the other side. The latter is optional if the didactic use is separated by other whitespace from any adjacent text. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrewcwest at gmail.com Tue Feb 21 06:34:20 2023 From: andrewcwest at gmail.com (Andrew West) Date: Tue, 21 Feb 2023 12:34:20 +0000 Subject: Zero-Width Joiner U+200D In-Reply-To: References: <23a469db-fc5e-5037-5da0-01e9e0c13c6a@ix.netcom.com> Message-ID: On Tue, 21 Feb 2023 at 11:52, Jukka K. Korpela via Unicode wrote: > > Asmus Freytag via Unicode (unicode at corp.unicode.org) wrote: > >> Clearly one of the design features was to also support "didactic" formatting, that is, bracketing a character with joiners or non-joiners to be able to show positional forms in isolation. > > > If that was, and is, the intent, then the standard apparently should be amended, adding such descriptions. At present, the description only deals with the effect of ZWJ when it appears between two graphic characters, and it is difficult to avoid the impression that in other contexts it should be ignored. In any case, there is no requirement or recommendation or even suggestion about its effect when used at the start of a string or at the end of a string. The use of ZWJ to show positional forms in isolation is explicitly discussed in the Unicode Standard for Mongolian (https://www.unicode.org/versions/Unicode15.0.0/ch13.pdf#G27803 p. 559) and Phags-pa (https://www.unicode.org/versions/Unicode15.0.0/ch14.pdf#G40430 pp. 604-605). Andrew From jukkakk at gmail.com Tue Feb 21 06:56:02 2023 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Tue, 21 Feb 2023 14:56:02 +0200 Subject: Zero-Width Joiner U+200D In-Reply-To: <6f4f18b5-b880-56ee-48a2-99e3d910621c@ix.netcom.com> References: <23a469db-fc5e-5037-5da0-01e9e0c13c6a@ix.netcom.com> <6f4f18b5-b880-56ee-48a2-99e3d910621c@ix.netcom.com> Message-ID: Asmus Freytag via Unicode (unicode at corp.unicode.org) wrote: I think we need to look at whether the language accurately reflects what we > were trying to say. I do know that it was revised at one point, when the > use of ZWJ was generalized beyond cursive connection. > It seems that this took place as early as in Unicode 2. > The interpretation you suggest may be an inadvertent result of that > change, or someone had found out why the usage that I always understood as > intended is for some reason problematic. In that case, it should be > excluded more explicitly, in my view. > In fact, reading chapter 23 onwards, I now see the use of ZWJ?s around a character to ask for isolated form. It was just so far from the place that described ZWJ and ZWNJ between adjacent characters, giving the impression that this is their only use. Perhaps it would help to remove the word ?adjacent? from ?U+200D zero width joiner is intended to produce a more connected rendering of adjacent characters than would otherwise be the case, if possible. The text describes the use of ZWJ for isolated form and shows this in example 23-1. Sorry for the confusion I caused. So the answer to Andreas? question is ?yes, it should?, with the value of ?should? roughly as ?is intended to, according to the Unicode standard, but a program that renders Unicode characters is not required to obey, or even understand, such rendering suggestions? Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From gtbot2007 at gmail.com Thu Feb 23 11:33:45 2023 From: gtbot2007 at gmail.com (Gabriel Tellez) Date: Thu, 23 Feb 2023 12:33:45 -0500 Subject: =?UTF-8?Q?Why_is_=E3=80=87_not_a_Unified_Ideograph=3F?= Message-ID: Why is ? (U+3007) not a CJK Unified Ideograph? -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Thu Feb 23 23:30:54 2023 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Fri, 24 Feb 2023 14:30:54 +0900 Subject: =?UTF-8?Q?Re=3a_Why_is_=e3=80=87_not_a_Unified_Ideograph=3f?= In-Reply-To: References: Message-ID: <279a3eed-d8c6-6ce7-fbde-cb1959ce8ebd@it.aoyama.ac.jp> On 2023-02-24 02:33, Gabriel Tellez via Unicode wrote: > Why is ? (U+3007) not a CJK Unified Ideograph? This is an interesting question. Essentially, this character is "about half" an ideograph. That means that in some respect, it is an ideograph, but not in others. Let's look at some of these: For: - Used together with other ideographs in running text - Has pronunciations,... like other ideographs - Has same width as other ideographs Against: - Shape: A circle not composed of strokes (there are other ideographs that contain round strokes, e.g. ? (round), but none with an actual circle) - Doesn't have a radical - History: Not used in really old texts (where ???? are used) - Isn't encoded with other ideographs in the original Japanese/ Chinese/Korean standards There may be other reasons. I think in one of his dictionaries, Jack Halpern described that he discovered ? as a new ideograph. But it's really just a matter of viewpoint, and Unicode took the viewpoint of the preceding national standards that ? isn't a CJK ideograph. Regards, Martin. From jameskass at code2001.com Fri Feb 24 05:34:33 2023 From: jameskass at code2001.com (James Kass) Date: Fri, 24 Feb 2023 11:34:33 +0000 Subject: =?UTF-8?Q?Re=3a_Why_is_=e3=80=87_not_a_Unified_Ideograph=3f?= In-Reply-To: <279a3eed-d8c6-6ce7-fbde-cb1959ce8ebd@it.aoyama.ac.jp> References: <279a3eed-d8c6-6ce7-fbde-cb1959ce8ebd@it.aoyama.ac.jp> Message-ID: On 2023-02-24 5:30 AM, Martin J. D?rst via Unicode wrote: > - Shape: A circle not composed of strokes > ? (there are other ideographs that contain round strokes, > ?? e.g. ? (round), but none with an actual circle) Three use an oval:? ? ? ? As pointed out, there are some which use unusually rounded strokes, such as ? and ?. From zsigri at gmail.com Sun Feb 26 07:32:56 2023 From: zsigri at gmail.com (Zsigri Gyula) Date: Sun, 26 Feb 2023 14:32:56 +0100 Subject: Old Hungarian Alphabet Unicode Message-ID: You can do it by defining GSUB Lookups. Create the missing glyphs with custom id's, e.g. create a glyph for the Old Hungarian "kv" ligature with the id uni10CD3_10CEE. Then go to GSUB Lookups and define how to type that glyph. Normally you use the zero width joiner (U+200D) in ligature definitions so it is a good idea to define a key for U+200D in your keyboard layout. In some applications you can insert a zero width joiner by typing 200D and then pressing Alt+x. In GSUB Lookups, you give the id of the newly created glyph (uni10CD3_10CEE) and define an input string for it: u10CD3 for Old Hungarian "k", U+200D for the zero width joiner, and u10CEE for Old Hungarian "v". In FontForge, this is how you can access GSUB Lookups: Element > Font Info > Lookups > GSUB. https://drive.google.com/file/d/1sZh3CUzEvQHv6SJSpfinW8Z0AgHS317-/view?usp=share_link On Thu, Feb 9, 2023 at 9:57?PM Roland H?se via Unicode wrote: > > Dear Peter, Craig and members of the Unicode Consortium, > > First of all, let me express my gratitude for the possibility of becoming an individual member of the Unicode Consortium. I am honored to be here. > > My name is Roland H?se, I am a freelance type designer based in Hungary. I am currently studying and researching Old Hungarian (Rovas Script). > I am working towards the implementation and modernization of this script by creating my new fonts with this script extension as well as designing matching scripts to existing fonts. > > However, while creating typefaces I have found out that some of the glyphs from this kind of script are missing. > > I have made keyboard layout for Mac and I am in the process of making a custom keyboard App for iOS. > > Therefore I would like to open a topic about the unicode standard of this alphabet, which I was pleased to learn that it has been added to the Unicode Standard not long ago. > > Would you please let me know if anybody in the group knows about Old Hungarian, or can give me information about how could I discuss this in the future. > > Furthermore, what are the necessary steps to propose possible solutions to the addressed matter (Missing codepoints). > > Thank you so much, > > Kindest regards, > Roland H?se > > https://www.rolandhuse.com -------------- next part -------------- A non-text attachment was scrubbed... Name: GSUB_Lookups.png Type: image/png Size: 147515 bytes Desc: not available URL: