From pgcon6 at msn.com Mon Feb 3 11:19:06 2025 From: pgcon6 at msn.com (Peter Constable) Date: Mon, 3 Feb 2025 17:19:06 +0000 Subject: Odp: RE: Re: Re: Unicode fundamental character identity In-Reply-To: <073374f2aafd4f259cf3db508996bf1f@grupawp.pl> References: <709a2fd4e2bd4d2295655ee9431bf09a@grupawp.pl> <2342e8f045c94172af46545b8c1fae4e@grupawp.pl> <5e701b17-444b-16d2-9bed-817f354d7fdb@unicode.org> <41f332d8a568413088782d9d0982715f@grupawp.pl> <3396e2f2cb394320b8d90d7650a38b55@grupawp.pl> <25c51d1e-9002-4be2-9d6b-8b5e5a53beae@code2001.com> <073374f2aafd4f259cf3db508996bf1f@grupawp.pl> Message-ID: As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. A proposal for encoding a distinction would need to provide a different line of argumentation for a need to encode. Peter From: Unicode On Behalf Of piotrunio-2004 at wp.pl via Unicode Sent: Friday, January 31, 2025 2:28 PM To: James Kass ; unicode Subject: Re: Odp: RE: Re: Re: Unicode fundamental character identity Dnia 31 stycznia 2025 22:08 James Kass via Unicode > napisa?(a): On 2025-01-31 5:42 PM, piotrunio-2004 at wp.pl via Unicode wrote: I'm not saying that it is, but if I'm relying on arguments relevant to the actual usage of the characters, and the dominant opposing side does not provide all that much coherent of a reasoning in return, then I'm getting suspicious. Doug and Peter have provided some good advice with respect to such suspicions. My apologies to Peter Constable for my failure to understand exactly what was being dismissed earlier in this thread. Quoting from https://www.unicode.org/L2/L2025/25010-script-wg-report.pdf "We deem the differences demonstrated in the proposal to not constitute differences in plain text. No evidence of a document that would make a distinction between the corresponding characters in the different code pages was provided." That seems coherent enough. If a simple and concise exhibit can be made showing the desired distinction making a difference in plain text, then that would be a logical next step. Evidence illustrating data loss in round-tripping would also be helpful. Input from the user community supporting retaining distinctions in Unicode should help the effort. The proposal L2/25-037 already shows a difference in plain text of the HP 264x characters, where 0x12 (2) connects below vertical or perpendicular diagonal, whereas 0x18 (8) connects below diagonal of same direction. Those are different types of connections which is a plain text distinction of box drawings. Data loss in round-tripping is implicitly evident from the information provided in the proposal: if an HP 264x Large Character set mode document has the characters 0x12 0x18, it converts to Unicode as U+1CE2B U+1CE2B, which converted back to HP 264x Large Character set mode is 0x12 0x12, which loses the distinction between the two characters and will appear slightly differently than the original document on HP 264x platform. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sosipiuk at gmail.com Mon Feb 3 11:36:18 2025 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Mon, 03 Feb 2025 17:36:18 +0000 Subject: Unicode fundamental character identity In-Reply-To: References: Message-ID: <1738603804156.1426487909.77361119@gmail.com> On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via Unicode wrote: As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. I'm honestly surprised by this. I always thought (because it was repeated so many times - must remember repetition does not equal truth) that round-trip compatibility with old character sets was a founding cornerstone of Unicode and so contrastive use (aka source separation) in an old charset would be persuasive evidence for inclusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Mon Feb 3 12:46:18 2025 From: pgcon6 at msn.com (Peter Constable) Date: Mon, 3 Feb 2025 18:46:18 +0000 Subject: Unicode fundamental character identity In-Reply-To: <1738603804156.1426487909.77361119@gmail.com> References: <1738603804156.1426487909.77361119@gmail.com> Message-ID: Source separation for round-trip compatibility was a principle applied circa 1990 for compatibility with widely-used standards at that time. Today, source separation is not a sufficient criterion for encoding distinctions in other legacy character sets. It can be provided as part of the evidence in a proposal, but other evidence would be required as for any new character proposal, in particular that a text element cannot be adequately represented using any existing character sequences and that there is a significant user community requiring public, plain-text interchange. Peter From: S?awomir Osipiuk Sent: February 3, 2025 10:36 AM To: Peter Constable ; Peter Constable via Unicode ; piotrunio-2004 at wp.pl; James Kass Subject: Re: Unicode fundamental character identity On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via Unicode wrote: As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. I'm honestly surprised by this. I always thought (because it was repeated so many times - must remember repetition does not equal truth) that round-trip compatibility with old character sets was a founding cornerstone of Unicode and so contrastive use (aka source separation) in an old charset would be persuasive evidence for inclusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Mon Feb 3 14:24:45 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Feb 2025 12:24:45 -0800 Subject: Unicode fundamental character identity In-Reply-To: <1738603804156.1426487909.77361119@gmail.com> References: <1738603804156.1426487909.77361119@gmail.com> Message-ID: <603706d9-30e1-4cfc-9ff7-e58856825440@ix.netcom.com> On 2/3/2025 9:36 AM, S?awomir Osipiuk via Unicode wrote: > On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via > Unicode wrote: > > As stated previously, Unicode makes no guarantee of supporting > source separation / round-trip compatibility with HP264x. > > > I'm honestly surprised by this. I always thought (because it was > repeated so many times - must remember repetition does not equal > truth) that round-trip compatibility with old character sets was a > founding cornerstone of Unicode and so contrastive use (aka source > separation) in an old charset would be persuasive evidence for inclusion. You guys are talking past each other a bit. Unicode decided early on to guarantee round-trip to important, widely used character sets of the time. The key interest was to be able to deploy software that worked internally in Unicode but could interface with existing systems without incurring data loss in round trip. This level guarantee does not exist for just any character set. It didn't even exist for all character sets then in existence. However, if conflating two characters causes a particular problem, Unicode has accepted case-by-case requests not to unify them, or even to disunify them. However, instead of applying a guarantee, the UTC will look at a bit of a cost/benefit analysis, considering the cost of having to encode additional characters (in perpetuity) vs. the benefit for the intended users. If this is a problem with a single character, I don't really buy the cost savings argument, especially in a case where after adding some extensions, a whole set could be matched. If there is a group involved, the cost goes up. On the other hand, I also would like to understand the benefit for the supposed user group. Is it mainly that of avoiding a single pixel infidelity in display only, or are these characters that would need to round-trip, because they might be in data that is entered on a simulated device, processed on a Unicode system and then output again. I think it's stupid for both sides to fight over a single pixel. Yes, it smells like a bad unification even though the character is arcane (but so are others where minute details matter even though 'nobody' is likely to use that character much). Having a stupidly incomplete mapping can be frustrating, but is being unfaithful going to impact users in any noticeable way? A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From prospero at cyber-wizard.com Mon Feb 17 14:11:25 2025 From: prospero at cyber-wizard.com (prospero) Date: Mon, 17 Feb 2025 21:11:25 +0100 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? Message-ID: For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical? From lists at akphs.com Tue Feb 18 10:04:44 2025 From: lists at akphs.com (Phil Smith III) Date: Tue, 18 Feb 2025 11:04:44 -0500 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: References: Message-ID: <046b01db821e$d4290750$7c7b15f0$@akphs.com> This sounds interesting, but with no links or other references is a bit opaque. Can you add more information? -----Original Message----- From: Unicode On Behalf Of prospero via Unicode Sent: Monday, February 17, 2025 3:11 PM To: unicode at corp.unicode.org Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical? From prospero at cyber-wizard.com Tue Feb 18 10:59:49 2025 From: prospero at cyber-wizard.com (prospero) Date: Tue, 18 Feb 2025 17:59:49 +0100 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: <046b01db821e$d4290750$7c7b15f0$@akphs.com> References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> Message-ID: In: https://www.unicode.org/Public/16.0.0/ucd/UnicodeData.txt the decomposition type names are cammel-cased (surrounded by brackets), like this: 00A0;NO-BREAK SPACE;Zs;0;CS; 0020;;;;N;NON-BREAKING SPACE;;;; and: 00A8;DIAERESIS;Sk;0;ON; 0020 0308;;;;N;SPACING DIAERESIS;;;; Whereas in: https://www.unicode.org/Public/16.0.0/ucd/extracted/DerivedDecompositionType.txt the decomposition type names are capitalized on the first letter only, like this: 00A0 ; Nobreak # Zs NO-BREAK SPACE and: FB54 ; Initial # Lo ARABIC LETTER BEEH INITIAL FORM > Sent: Tuesday, February 18, 2025 at 11:04 AM > From: "Phil Smith III via Unicode" > To: "'prospero'" , unicode at corp.unicode.org > Subject: RE: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > This sounds interesting, but with no links or other references is a bit opaque. Can you add more information? > > -----Original Message----- > From: Unicode On Behalf Of prospero via Unicode > Sent: Monday, February 17, 2025 3:11 PM > To: unicode at corp.unicode.org > Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical? > > > From prospero at cyber-wizard.com Tue Feb 18 11:05:01 2025 From: prospero at cyber-wizard.com (prospero) Date: Tue, 18 Feb 2025 18:05:01 +0100 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> Message-ID: > Sent: Tuesday, February 18, 2025 at 11:59 AM > From: "prospero via Unicode" > To: lists at akphs.com > Cc: unicode at corp.unicode.org > Subject: Re: RE: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > In: https://www.unicode.org/Public/16.0.0/ucd/UnicodeData.txt > > the decomposition type names are cammel-cased (surrounded by brackets), like this: (The brackets are not part of the camel-cased name, and are just there to ease parsing the decomposition field.) From lists at akphs.com Tue Feb 18 11:33:35 2025 From: lists at akphs.com (Phil Smith III) Date: Tue, 18 Feb 2025 12:33:35 -0500 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> Message-ID: <04be01db822b$3d910380$b8b30a80$@akphs.com> Thanks. I tend to agree--things that refer to the same thing should be the same. I then wonder, "In what context does this matter, beyond PoE*?" Not saying it can't/shouldn't -- consistency is good even if it only avoids someone wondering one day whether two things really are the same or not! -- but is there a specific place where this difference causes a problem? Having one should make the argument even stronger for fixing it. ...phsiii *Purity of Essence--see "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb", 1964 -----Original Message----- From: prospero Sent: Tuesday, February 18, 2025 12:00 PM To: lists at akphs.com Cc: unicode at corp.unicode.org Subject: Re: RE: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In: https://www.unicode.org/Public/16.0.0/ucd/UnicodeData.txt the decomposition type names are cammel-cased (surrounded by brackets), like this: 00A0;NO-BREAK SPACE;Zs;0;CS; 0020;;;;N;NON-BREAKING SPACE;;;; and: 00A8;DIAERESIS;Sk;0;ON; 0020 0308;;;;N;SPACING DIAERESIS;;;; Whereas in: https://www.unicode.org/Public/16.0.0/ucd/extracted/DerivedDecompositionType.txt the decomposition type names are capitalized on the first letter only, like this: 00A0 ; Nobreak # Zs NO-BREAK SPACE and: FB54 ; Initial # Lo ARABIC LETTER BEEH INITIAL FORM > Sent: Tuesday, February 18, 2025 at 11:04 AM > From: "Phil Smith III via Unicode" > To: "'prospero'" , unicode at corp.unicode.org > Subject: RE: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > This sounds interesting, but with no links or other references is a bit opaque. Can you add more information? > > -----Original Message----- > From: Unicode On Behalf Of prospero via Unicode > Sent: Monday, February 17, 2025 3:11 PM > To: unicode at corp.unicode.org > Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical? From asmusf at ix.netcom.com Tue Feb 18 12:44:22 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 18 Feb 2025 10:44:22 -0800 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: <046b01db821e$d4290750$7c7b15f0$@akphs.com> References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> Message-ID: <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> The spellings are equivalent under the naming rules. That's all that formally matters. Fixing this now, would break any literal-minded parsers for whichever file is changed, while not making a formal difference. There are enough other idiosyncrasies in the way these files are organized, that this one is far from the worst. The only rule that matters is that any of the values in PropertyValueAliases.txt, when matched without regard to case, hyphens, or underscore, matches all the other ones for the same property value. For character names, spaces also don't count (but there are 2-3 odd exceptional names that need to be handled specially). A./ On 2/18/2025 8:04 AM, Phil Smith III via Unicode wrote: > This sounds interesting, but with no links or other references is a bit opaque. Can you add more information? > > -----Original Message----- > From: Unicode On Behalf Of prospero via Unicode > Sent: Monday, February 17, 2025 3:11 PM > To: unicode at corp.unicode.org > Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical? > > From asmusf at ix.netcom.com Tue Feb 18 21:26:52 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 18 Feb 2025 19:26:52 -0800 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> Message-ID: On 2/18/2025 10:44 AM, Asmus Freytag via Unicode wrote: > The spellings are equivalent under the naming rules. That's all that > formally matters. Fixing this now, would break any literal-minded > parsers for whichever file is changed, while not making a formal > difference. > > There are enough other idiosyncrasies in the way these files are > organized, that this one is far from the worst. > > The only rule that matters is that any of the values in > PropertyValueAliases.txt, when matched without regard to case, > hyphens, or underscore, matches all the other ones for the same > property value. Sorry, badly phased: any string that matches any of the ... > > For character names, spaces also don't count (but there are 2-3 odd > exceptional names that need to be handled specially). > > A./ > > On 2/18/2025 8:04 AM, Phil Smith III via Unicode wrote: >> This sounds interesting, but with no links or other references is a >> bit opaque. Can you add more information? >> >> -----Original Message----- >> From: Unicode On Behalf Of >> prospero via Unicode >> Sent: Monday, February 17, 2025 3:11 PM >> To: unicode at corp.unicode.org >> Subject: Why does the spelling (capitalization) of decomposition >> types differ in DerivedDecompositionType.txt from UnicodeData.txt? >> >> For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" >> in UnicodeData.txt. If the former is derived from the latter, >> shouldn't the spelling be identical? >> >> > From pgcon6 at msn.com Thu Feb 20 10:57:08 2025 From: pgcon6 at msn.com (Peter Constable) Date: Thu, 20 Feb 2025 16:57:08 +0000 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> Message-ID: PropertyValueAliases.txt has this entry: dt ; Nb ; Nobreak ; nb What doesn't seem clear from UAX #44 is whether an alias could be added that is equivalent under name matching rules to an existing alias. -----Original Message----- From: Unicode On Behalf Of Asmus Freytag via Unicode Sent: Tuesday, February 18, 2025 11:44 AM To: unicode at corp.unicode.org Subject: Re: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? The spellings are equivalent under the naming rules. That's all that formally matters. Fixing this now, would break any literal-minded parsers for whichever file is changed, while not making a formal difference. There are enough other idiosyncrasies in the way these files are organized, that this one is far from the worst. The only rule that matters is that any of the values in PropertyValueAliases.txt, when matched without regard to case, hyphens, or underscore, matches all the other ones for the same property value. For character names, spaces also don't count (but there are 2-3 odd exceptional names that need to be handled specially). A./ On 2/18/2025 8:04 AM, Phil Smith III via Unicode wrote: > This sounds interesting, but with no links or other references is a bit opaque. Can you add more information? > > -----Original Message----- > From: Unicode On Behalf Of prospero > via Unicode > Sent: Monday, February 17, 2025 3:11 PM > To: unicode at corp.unicode.org > Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? > > For example, "Nobreak" in DerivedDecompositionType.txt vs "noBreak" in UnicodeData.txt. If the former is derived from the latter, shouldn't the spelling be identical? > > From markus.icu at gmail.com Thu Feb 20 13:29:02 2025 From: markus.icu at gmail.com (Markus Scherer) Date: Thu, 20 Feb 2025 11:29:02 -0800 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> Message-ID: On Thu, Feb 20, 2025 at 8:59?AM Peter Constable via Unicode < unicode at corp.unicode.org> wrote: > PropertyValueAliases.txt has this entry: > > dt ; Nb ; Nobreak ; > nb > > What doesn't seem clear from UAX #44 is whether an alias could be added > that is equivalent under name matching rules to an existing alias. > It seems like that's allowed, but what would be the point? markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Fri Feb 21 10:06:57 2025 From: pgcon6 at msn.com (Peter Constable) Date: Fri, 21 Feb 2025 16:06:57 +0000 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> Message-ID: The only point would be to reflect spelling variants used in UCD data files? which may not be of sufficient benefit to bother. From: Markus Scherer Sent: Thursday, February 20, 2025 12:29 PM To: Peter Constable Cc: Asmus Freytag ; unicode at corp.unicode.org Subject: Re: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? On Thu, Feb 20, 2025 at 8:59?AM Peter Constable via Unicode > wrote: PropertyValueAliases.txt has this entry: dt ; Nb ; Nobreak ; nb What doesn't seem clear from UAX #44 is whether an alias could be added that is equivalent under name matching rules to an existing alias. It seems like that's allowed, but what would be the point? markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecm.unicode at gmail.com Fri Feb 21 13:37:47 2025 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Fri, 21 Feb 2025 14:37:47 -0500 Subject: Why does the spelling (capitalization) of decomposition types differ in DerivedDecompositionType.txt from UnicodeData.txt? In-Reply-To: <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> References: <046b01db821e$d4290750$7c7b15f0$@akphs.com> <1b2983a2-ac48-4008-8b20-f8e951cb6d8d@ix.netcom.com> Message-ID: On Tue, Feb 18, 2025 at 1:47?PM Asmus Freytag via Unicode wrote: > The only rule that matters is that any of the values in > PropertyValueAliases.txt, when matched without regard to case, hyphens, > or underscore, matches all the other ones for the same property value. > > For character names, spaces also don't count (but there are 2-3 odd > exceptional names that need to be handled specially). One such exception is HANGUL JUNGSEONG O-E (U+1180), in which the hyphen?minus is considered significant, lest that character name collide with HANGUL JUNGSEONG OE (U+116C). Hyphen?minus is also significant in character names when it precedes or follows a space, as in TIBETAN LETTER -A (U+0F60) [cf. TIBETAN LETTER A (U+0F68)]. Additionally, there is a rule that the strings ?CHARACTER?, ?LETTER?, and ?DIGIT? are to be ignored in character?name matching for determining uniqueness, with a legacy exception for CANCEL (U+0018) and CANCEL CHARACTER (U+0094), both of which are character aliases rather than character names per se but inhabit that same character?name namespace. (However, as I pointed out in L2/24-073 [https://www.unicode.org/L2/L2024/24073-char-namespace.pdf], the ?CHARACTER?/?LETTER?/?DIGIT? rule and its exception are given inconsistent treatment in the current text of the Standard.) From tameiraomatthew968 at gmail.com Sat Feb 22 14:22:12 2025 From: tameiraomatthew968 at gmail.com (Matthew Tameirao) Date: Sat, 22 Feb 2025 12:22:12 -0800 Subject: Please Make a proposal for Aymara and Paucartambo Message-ID: Hello, Please make 2 Scripts for a Unicode Proposal, these 2 scripts are Aymara and Paucartambo. Please make a Proposal for them. Thank You! -------------- next part -------------- An HTML attachment was scrubbed... URL: From olopierpa at gmail.com Sat Feb 22 18:03:23 2025 From: olopierpa at gmail.com (Pierpaolo Bernardi) Date: Sun, 23 Feb 2025 01:03:23 +0100 Subject: Please Make a proposal for Aymara and Paucartambo In-Reply-To: References: Message-ID: On Sat, Feb 22, 2025 at 10:02?PM Matthew Tameirao via Unicode wrote: > > Hello, Please make 2 Scripts for a Unicode Proposal, these 2 scripts are Aymara and Paucartambo. Please make a Proposal for them. Are there news with respect to what Michael Everson summarized here: https://www.unicode.org/notes/tn4/everson-iuc21pap.pdf ? From tameiraomatthew968 at gmail.com Sun Feb 23 19:50:57 2025 From: tameiraomatthew968 at gmail.com (Matthew Tameirao) Date: Sun, 23 Feb 2025 17:50:57 -0800 Subject: Please add a few blocks to Provisionally Assigned in Pipeline, Other Proposals, and Roadmap Ideas Message-ID: Unicode 17.0 - Sidetic - Sharada Supplement - Tolong Siki - Chisoi - Beria Erfe - Tangut Components Supplement - Miscellaneous Symbols Supplement - Tai Yo - CJK Unified Ideographs Extension J Unicode 18.0 (Provisionally Assigned) - Northern Paleohispanic - Southern Paleohispanic - Sirmauri - Archaic Cuneiform Numerals - Mwanwego - Jurchen - Jurchen Radicals - Musical Symbols Supplement - Persian Siyaq Numbers Roadmap Ideas Plane 3 (TIP) - Seal Script (U+38000-U+3AB9F) - Oracle Bone (3ABA0-U+3BF4F) - Bronze Script (U+3BF50-U+3D3FF) Plane 4 (TMP) - Aymara Pictograms (U+40000-U+403FF) - Aztec Pictograms (U+40400-U+40BFF) - Quipu Patterns (U+40C00-U+40FFF) - ??? (U+41000-U+42FFF) - Satavahana (U+43000-U+4309F) - Gupta (U+430A0-U+430DF) - ??? (U+430E0-U+45FFF) - Rongorongo (U+46000-U+463FF) - Paucartambo (U+46400-U+465FF) - ??? (U+46600-U+4FFFF) Planes Plane 0: Basic Multilingual Plane Plane 1: Supplementary Multilingual Plane Plane 2: Supplementary Ideographic Plane Plane 3: Tertiary Ideographic Plane Plane 4: Tertiary Multilingual Plane Plane 5: Complementary Multilingual Plane Plane 6: Home Multilingual Plane Plane 7: Extended Multilingual Plane Plane 8: Home Ideographic Plane Plane 9: Kana Ideographic Plane Plane 10: Complementary Ideographic Plane Plane 11: Extended-A Multilingual Plane Plane 12 Extended-B Multilingual Plane Plane 13: Extended-C Multilingual Plane Plane 14: Supplementary Special Purposes Plane Plane 15: Supplementary Private Use Plane-A Plane 16: Supplementary Private Use Plane-B Plane 17: Supplementary Surrogates Plane Plane 18: Tertiary Special Purposes Plane That is all my ideas, Thank You! -------------- next part -------------- An HTML attachment was scrubbed... URL: From beckiergb at gmail.com Sun Feb 23 20:13:34 2025 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Sun, 23 Feb 2025 18:13:34 -0800 Subject: Please add a few blocks to Provisionally Assigned in Pipeline, Other Proposals, and Roadmap Ideas In-Reply-To: References: Message-ID: Unicode is nowhere near needing more than the 17 planes it already has. And what are these names? "Home Multilingual Plane"? I'd expect that to be another name for plane 0, not plane 6. Many of these scripts are already on the roadmap. Some of them we can find no information on. It's no use adding scripts to the roadmap if we don't even know how many characters they have or even if they are scripts at all. -- Rebecca Bettencourt On Sun, Feb 23, 2025 at 5:54?PM Matthew Tameirao via Unicode < unicode at corp.unicode.org> wrote: > Unicode 17.0 > > - Sidetic > - Sharada Supplement > - Tolong Siki > - Chisoi > - Beria Erfe > - Tangut Components Supplement > - Miscellaneous Symbols Supplement > - Tai Yo > - CJK Unified Ideographs Extension J > > Unicode 18.0 (Provisionally Assigned) > > - Northern Paleohispanic > - Southern Paleohispanic > - Sirmauri > - Archaic Cuneiform Numerals > - Mwanwego > - Jurchen > - Jurchen Radicals > - Musical Symbols Supplement > - Persian Siyaq Numbers > > Roadmap Ideas > > Plane 3 (TIP) > > - Seal Script (U+38000-U+3AB9F) > - Oracle Bone (3ABA0-U+3BF4F) > - Bronze Script (U+3BF50-U+3D3FF) > > Plane 4 (TMP) > > - Aymara Pictograms (U+40000-U+403FF) > - Aztec Pictograms (U+40400-U+40BFF) > - Quipu Patterns (U+40C00-U+40FFF) > - ??? (U+41000-U+42FFF) > - Satavahana (U+43000-U+4309F) > - Gupta (U+430A0-U+430DF) > - ??? (U+430E0-U+45FFF) > - Rongorongo (U+46000-U+463FF) > - Paucartambo (U+46400-U+465FF) > - ??? (U+46600-U+4FFFF) > > Planes > Plane 0: Basic Multilingual Plane > Plane 1: Supplementary Multilingual Plane > Plane 2: Supplementary Ideographic Plane > Plane 3: Tertiary Ideographic Plane > Plane 4: Tertiary Multilingual Plane > Plane 5: Complementary Multilingual Plane > Plane 6: Home Multilingual Plane > Plane 7: Extended Multilingual Plane > Plane 8: Home Ideographic Plane > Plane 9: Kana Ideographic Plane > Plane 10: Complementary Ideographic Plane > Plane 11: Extended-A Multilingual Plane > Plane 12 Extended-B Multilingual Plane > Plane 13: Extended-C Multilingual Plane > Plane 14: Supplementary Special Purposes Plane > Plane 15: Supplementary Private Use Plane-A > Plane 16: Supplementary Private Use Plane-B > Plane 17: Supplementary Surrogates Plane > Plane 18: Tertiary Special Purposes Plane > > That is all my ideas, > > Thank You! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecm.unicode at gmail.com Mon Feb 24 09:41:59 2025 From: ecm.unicode at gmail.com (Erik Carvalhal Miller) Date: Mon, 24 Feb 2025 10:41:59 -0500 Subject: Please add a few blocks to Provisionally Assigned in Pipeline, Other Proposals, and Roadmap Ideas In-Reply-To: References: Message-ID: On Sun, Feb 23, 2025 at 8:55?PM Matthew Tameirao via Unicode wrote: > Plane 6: Home Multilingual Plane Where?s the Workplace Multilingual Plane? That?s where I often encounter entirely different forms of communication? > Plane 17: Supplementary Surrogates Plane > Plane 18: Tertiary Special Purposes Plane If Plane 17 is how one accesses Plane 18, I sense a major architectural flaw in the plane plan. (See .) From pgcon6 at msn.com Wed Feb 26 09:12:29 2025 From: pgcon6 at msn.com (Peter Constable) Date: Wed, 26 Feb 2025 15:12:29 +0000 Subject: Please Make a proposal for Aymara and Paucartambo In-Reply-To: References: Message-ID: Hola, Matthew Anyone who would like to see new characters or scripts encoded and has access to the relevant information about the script can write a proposal for encoding. See this page for details: https://www.unicode.org/pending/proposals.html Peter Get Outlook for Mac From: Unicode on behalf of Matthew Tameirao via Unicode Date: Saturday, February 22, 2025 at 10:07?PM To: unicode at corp.unicode.org Subject: Please Make a proposal for Aymara and Paucartambo Hello, Please make 2 Scripts for a Unicode Proposal, these 2 scripts are Aymara and Paucartambo. Please make a Proposal for them. Thank You! -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Feb 26 10:54:27 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 26 Feb 2025 08:54:27 -0800 Subject: Please Make a proposal for Aymara and Paucartambo In-Reply-To: References: Message-ID: <16521af5-8b0c-4ac4-bef0-9e60ce2410f7@ix.netcom.com> Anyone interested in a particular script can also use this list to try to find other interested parties that might join in a proposal. Not everybody who sees a need for support of a given script has the background to work out a proposal on their own. But it might be more likely to find support among the community of people already interested in the script, perhaps including scholars of it. Those are perhaps not as likely to be members of this mailing list. Script proposals originate from the script community, not from inside the Unicode Consortium. Which means you cannot "request" a proposal for their encoding by posting here. Under ISO 15924 it is in principle possible to request a script code be registered. That registration consists of just the 4-letter abbreviation for a given name. Normally, that is not something that is necessary as the Unicode Technical Committee will request that implicitly after approving a proposal for encoding. A./ On 2/26/2025 7:12 AM, Peter Constable via Unicode wrote: > Hola, Matthew > > Anyone who would like to see new characters or scripts encoded and has > access to the relevant information about the script can write a > proposal for encoding. See this page for details: > > https://www.unicode.org/pending/proposals.html > > > Peter > > Get Outlook for Mac > > *From: *Unicode on behalf of > Matthew Tameirao via Unicode > *Date: *Saturday, February 22, 2025 at 10:07?PM > *To: *unicode at corp.unicode.org > *Subject: *Please Make a proposal for Aymara and Paucartambo > > Hello, Please make 2 Scripts for a Unicode Proposal, these 2 scripts > are Aymara and Paucartambo. Please make a Proposal for them. > > Thank You! -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Wed Feb 26 12:44:57 2025 From: markus.icu at gmail.com (Markus Scherer) Date: Wed, 26 Feb 2025 10:44:57 -0800 Subject: Please Make a proposal for Aymara and Paucartambo In-Reply-To: <16521af5-8b0c-4ac4-bef0-9e60ce2410f7@ix.netcom.com> References: <16521af5-8b0c-4ac4-bef0-9e60ce2410f7@ix.netcom.com> Message-ID: On Wed, Feb 26, 2025 at 8:59?AM Asmus Freytag via Unicode < unicode at corp.unicode.org> wrote: > Under ISO 15924 it is in principle possible to request a script code be > registered. That registration consists of just the 4-letter abbreviation > for a given name. Normally, that is not something that is necessary as the > Unicode Technical Committee will request that implicitly after approving a > proposal for encoding. > In fact, the registrar will in general want to see a well-formed proposal that is recommended by the Script Encoding Working Group which (a) indicates that the script is in fact eligible for encoding (as a separate script) and (b) discusses how to name the script so that the script code can be mnemonic. So asking for a script code comes very late in the process -- and is also basically automatic, as the registrar is looped into these things. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: