From doug at ewellic.org Sat Apr 1 00:07:25 2023 From: doug at ewellic.org (Doug Ewell) Date: Sat, 1 Apr 2023 05:07:25 +0000 Subject: How do U+2571..U+2573 connect? In-Reply-To: References: <03d4f71d-9d08-01bc-4227-8cf684197226@ix.netcom.com> Message-ID: Apologies; I misspoke. T.101-G2 is an encoding for Videotex, not teletext. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From: Kent Karlsson Sent: Friday, March 31, 2023 17:44 To: Doug Ewell Cc: Manuel Strehl ; unicode at corp.unicode.org Subject: Re: How do U+2571..U+2573 connect? I don?t see them in any of the G2 character sets in https://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf (ETSI EN 300 706 V1.2.1 (2003-04) European Standard (Telecommunications series) Enhanced Teletext specification). -------------- next part -------------- An HTML attachment was scrubbed... URL: From harjitmoe at outlook.com Sat Apr 1 02:44:44 2023 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sat, 1 Apr 2023 08:44:44 +0100 Subject: Inverted asterism In-Reply-To: References: Message-ID: Indeed, the only encoding I can think of right now (except PUA schemes) which includes a turned asterism is the font-specific encoding for the Bookshelf Symbol 7 font. --Har. Doug Ewell via Unicode wrote: > David Starner wrote: > >> There doesn't seem to be an inverted asterism in Unicode. Is there a >> good reason there's not? > Probably the same reason as always: it wasn?t in a known character set 30 years ago, and nobody has successfully proposed it since then. > > -- > Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org > > From asmusf at ix.netcom.com Sat Apr 1 03:01:10 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 1 Apr 2023 01:01:10 -0700 Subject: Inverted asterism In-Reply-To: References: Message-ID: Which gives us another data point that someone thought there was a use case sufficient enough to provide a digital implementation. A./ On 4/1/2023 12:44 AM, Harriet Riddle via Unicode wrote: > Indeed, the only encoding I can think of right now (except PUA > schemes) which includes a turned asterism is the font-specific > encoding for the Bookshelf Symbol 7 font. > > --Har. > > Doug Ewell via Unicode wrote: >> David Starner wrote: >> >>> There doesn't seem to be an inverted asterism in Unicode. Is there a >>> good reason there's not? >> Probably the same reason as always: it wasn?t in a known character >> set 30 years ago, and nobody has successfully proposed it since then. >> >> -- >> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harjitmoe at outlook.com Sat Apr 1 03:35:15 2023 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sat, 1 Apr 2023 09:35:15 +0100 Subject: How do U+2571..U+2573 connect? In-Reply-To: References: <03d4f71d-9d08-01bc-4227-8cf684197226@ix.netcom.com> Message-ID: U+2571, U+2572 and U+2573 are 0xA2AC, 0xA2AD and 0xA2AE respectively in Big5 (the dominant pre-Unicode encoding for Traditional Chinese). They also appear in GBK (Simplified Chinese) as 0xA875, 0xA876 and 0xA877 respectively, though GBK is post-Unicode (extending GB2312-as-EUC-CN to include the entire original URO and some other (non-Hangul) chars from other CJK encodings, in this case from Big5). IBM calls them SH020080, SH030080 and SH040080 respectively (the "8" here is just a fullwidth attribute, but they don't seem to appear in single-byte code pages as e.g. SH020000 anywhere). They appear, unsurprisingly, in the Traditional Chinese code pages, and in the versions of the Simplified Chinese code pages expanded for GBK; they also appear in the more expanded versions of Japanese EBCDIC for some reason. Also, yes, U+2571 and U+2572 (but not U+2573) appear in the G2 set of Videotex Data Syntax 3 / NAPLPS (ITU T.101 Annex D, ANSI X3.110:1983, CSA T500:1983, FIPS PUB 121), which has ISO-IR registration numbers 99 (registered by ANSI and later withdrawn in favour of the redundant 128) and 128 (registered by the ITU). Although it is itself a modified version of the ISO 6937 set, the base ISO 6937 set doesn't include these characters. --Har Rebecca Bettencourt via Unicode wrote: > These three box drawing diagonals appear in at least: > - Amstrad CPC > - Mattel Aquarius > - Atari 8-bit > - MSX > - PETSCII > - Kaypro > - Sharp MZ > - Ohio Scientific > - Robotron > > See page 11 of: > https://www.unicode.org/L2/L2019/19025-aux-LegacyComputingSources.pdf > > See page 5 of: > https://www.unicode.org/L2/L2021/21235-terminals-supplement-sources.pdf > > They don't appear in Teletext. > > As to how they came to be in Unicode originally, I don't know. > Probably some IBM or DEC character set. > > -- Rebecca Bettencourt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sosipiuk at gmail.com Sat Apr 1 04:45:00 2023 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Sat, 1 Apr 2023 05:45:00 -0400 Subject: Proposal to update the UTF-18 specification (RFC 4042) Message-ID: Background and Motivation: UTF-18, the UCS Transformation Format ? 18-bit, is specified in IETF RFC 4042. UTF-18, along with UTF-9, are intended to provide efficient storage and processing of UCS/Unicode text in nonet-based environments. Of the two formats, UTF-18 is notably much simpler, as it represents any UCS code point with exactly two nonets; that is, exactly 18 bits. A curious limitation of UTF-18 is that it is incapable of representing all potential UCS code points. At the time RFC 4042 was written, the ranges of code points representable by UTF-18 included all then-existing non-private assigned characters. This situation changed in 2020, with the release of ISO/IEC 10646:2020 and Unicode 13.0, which assigned characters to code points in the U+30000 to U+3FFFF range, known as Plane 3, or the Tertiary Ideographic Plane (TIP). The obvious values for representing these new characters in UTF-18 were defined by RFC 4042 to instead represent characters in Plane 15, the Supplementary Special Purpose Plane (SSP). This design decision was reasonable at the time, driven by the necessity of representing existing SSP characters. Unfortunately, it makes TIP characters impossible to encode in UTF-18 as it is currently specified. RFC 4042 explicitly forbids the use of any surrogate-pair mechanism to increase the amount of representable code points. However, with a minor modification, UTF-18 can be made to represent not only all currently assigned characters, but also all code points which have been roadmapped for future assignment. This change can ensure the practical viability of UTF-18 as a reliable storage format for the foreseeable future in all environments where nonet-based or octodectet-based storage and/or processing is ideal. Furthermore, this proposed updated UTF-18 definition holds to the fundamental design and spirit of the original, and presents a clear and straightforward upgrade path for existing UTF-18 implementations. Technical Details: The only amendments to the existing specification as given by RFC 4042 are: - Codepoint values in the range U+0000 - U+3FFFF are copied as the same value into a UTF-18 value. - Codepoint values in the range U+E0000 - U+E03FF are copied as values 0xdc00 - 0xdfff; that is, these values are shifted by 0xd2400. This strategy allows the representation of all code points in Planes 0, 1, 2, and now 3 (TIP), as well the portion of Plane 15 (SSP) encompassing all its currently assigned characters. For greater clarity, it is noted that this strategy does not make use of a surrogate mechanism as in UTF-16 (there is still a one-to-one correspondence between code points and code units,) nor does it assign code points from the reserved U+DB00 to U+DFFF range (though it does make use of code units in that range.) The fundamental definition of UTF-18 is not changed. Only some values involved in the range mapping between code points and code units is altered from the original definition in order to more efficiently cover assigned code points. The full updated specification may be obtained by running the UNIX command: wget -q -O - https://www.rfc-editor.org/rfc/rfc4042.txt \ | sed -E 's/EFF/E03/;s/0x3[0f]([0f]{3})/0xd\1/g; s/d0/dc/;s/2F/3F/;s/600/156/;s/700/d24/' Upgrade Considerations: Remarkably, and very fortunately, a recent extensive search of all publicly available UTF-18 data has found no instances of UCS values in the U+E0000 to U+EFFFF range; that is, there is currently zero use of SSP characters in publicly-visible UTF-18 environments. While an accurate measure of SSP character usage in private UTF-18 environments cannot be determined, it can reasonably be presumed to closely correspond to the public usage, thus, to be extremely low or zero. This presents an extraordinary, but time-sensitive, opportunity to upgrade UTF-18 systems with the enhancement herein described with minimal disruption and effort. All upgradable software which processes UTF-18 data can be upgraded to the new range mappings quickly and simply. Reasonable program code implementations would most likely store the range limits and offset as constant values which can be adjusted in source code, with software re-compiled and re-deployed at low risk. This likewise applies to strategies which use bit-masking to effect the range mapping. The trivial nature of the change means that it is suitable for a point release, and can be easily back-ported to previous major releases should that be necessary. Where existing stored user data lacks any instances of SSP characters, there is nothing required to bring the data into compliance with the updated standard, and such data may be considered de facto ?forwards-compatible?. This is the crux of the current opportunity, and why time is of the essence. As the corpus of UTF-18 data grows with time, SSP characters will almost certainly be introduced in more environments, complicating the upgrade process. In environments where existing UTF-18 data already includes such characters, a suitable upgrade strategy can still be developed, of course, and should be developed without delay to ensure compatibility with the latest standard going forward. It is most desirable that UTF-18 data be cleanly interchangeable between UTF-18 compliant systems. As the change is simply an offset applied to a specific subset of UCS values, an algorithm to update data will in general be simple to develop, test, and execute. Implementors are urged to examine their existing environments, software, and user data to determine the best course, and strongly consider an in-place upgrade of the software (and, if required, data) wherever feasible. UTF-18 systems which cannot feasibly be upgraded will continue to function as expected with any data that does not include SIP or TIP code points. They should be carefully monitored for potential errors whenever data interchange with upgraded systems occurs. Data-sanitization strategies may be required, depending on the potential severity of mishandling SIP and TIP characters. Conclusion: The modification proposed here to the UTF-18 specification is an easy-to-implement enhancement that allows UTF-18 to cover the entire present non-private UCS character repertoire, ensuring that UTF-18 continues to remain as technically viable and relevant as it ever was in the face of continued development of the UCS and Unicode. From kent.b.karlsson at bahnhof.se Sat Apr 1 14:26:48 2023 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Sat, 1 Apr 2023 21:26:48 +0200 Subject: Missing Latin superscript lowercase letters In-Reply-To: <36980084-ad59-d8f2-378d-35116bb0d770@ix.netcom.com> References: <1317264669.2762677.1678192549210@email.ionos.de> <40E26F97-2412-4BB6-8056-A628D7E5200E@bahnhof.se> <046e15c0-8015-10d6-8720-b1ee3cc431e7@ix.netcom.com> <36980084-ad59-d8f2-378d-35116bb0d770@ix.netcom.com> Message-ID: <48B734CC-B191-4B80-BB9E-30B86BCB1FB7@bahnhof.se> > 25 mars 2023 kl. 22:34 skrev Asmus Freytag via Unicode : > >> >for unit symbols. When used with temperature, it's interesting to note that not all temperature >> >scales use it consistently. You don't see it with Fahrenheit very often, for example, reflecting >> >differences in traditional keyboard layouts. >> >> Ok, let?s digress a bit? I do see that too, in news articles (in web apps) from USA and British news >> companies and see also ?C? when degrees Celsius is meant. But writing farad (F) or coulomb (C) >> when referring to temperature is just horrible, and only embarrassing for the journalist who wrote >> that. (Another related horror is ?kph?, and there you cannot even blame keyboard layouts.) > I think it goes a bit too far to assume that any and all unit abbreviations have to be in the SI notation always. > Unit designations are not abbreviations. They are mathematical symbols, like ?lim?, ?sin?, ?arctan?, ???, ?+?, ?24?, ? And they participate in arithmetic/math expressions, like division, multiplication (including integer powers), and multiplication with numerical constants (like ?42.5?) and variables. While it is permissible to, say, write ?sns? instead of ?sin? for the sinus function, it is not a great idea to do so. Even worse would be to write ?lim? instead of ?sin? for the sinus function (which is directly comparable to the example with units above). The SI symbols are standardized, in a standard I?d say is the most important in the modern world. More important than Unicode/10646? It is a bit odd that representatives of a rather important IT standard does not recognize the importance of the most important, overall, standard in the world today (and for foreseeable future). > I'm sure there are places where there are regulations that define the use of specific abbreviations and in any contexts where they apply to SI, you would be free to read "k" as kilo and "kph" as kilo-ph (and then reject that as undefined). The same is not true for ordinary everyday usage in places where SI units aren't customary. > Likewise, the "ph" suffix to mean "per hour" is well established in places, while "/h" is not. That said, given that usage, I'd personally prefer kmph over kph. > I have never seen ?p? as the division operator in a properly formed mathematical expression. Permitted? Yes, of course. But ?rare and unusual?, and definitely a bad idea. > For example, in the weather forecast, 80F never refers to capacity, is understood by the audience, and therefore there's no objection to that usage on ground of confusion with SI units. That is not it. The thing is to keep with conventional, even standardised, notation or not. Even when keeping with conventional notation for units, there are ambiguities that need to be resolved by context: e.g. is B bel or byte? With a prefix (at a minimum, usually there is much more context), the ambiguity is resolved, there are no decibyte nor any megabinarybel (theoretically, there is megabel, but in practice not). If widening to chemistry, B could stand for boron; while not a unit of measure, it will have the same style as a (properly written) unit symbol. But there is no need to diverge from standard (pun intended) practice just because resolution by context is possible for the divergent notation. Your example is just like if one were to use V for hydrogen, just because one would know from a particular context that hydrogen is meant and not vanadium. I think if anyone were to write V and mean hydrogen, there would be heavy criticism. True, people may err in various ways. But I?m referring to people who should know better, like w.r.t. temperature: journalists and weather presenters. (Nit: I?ve even seen F? and C?, which looks even more, ummm, uneducated. And that in a worldwide well-known weather app. Go find! :-) They use technology from another company that gives prognosis for the amount of dandruff in the air? (but not in English). To make the whole thing even more comical, they pride themselves of being sooo accurate? Two all-year-round April?s fools jokes :-).) > However, usage is not consistent, you see it both with and without the degree sign, and without naming names, websites by academic institutions are just as likely to leave it off as popular websites are likely to add it. > > As you can see, actual usage is all over the place and as Unicode is not prescriptive, we simply deal with what's out there. But you do complain when things ?out there? are not up to par. Like in the ZWJ discussion not long ago. /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sat Apr 1 22:04:17 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 1 Apr 2023 20:04:17 -0700 Subject: Missing Latin superscript lowercase letters In-Reply-To: <48B734CC-B191-4B80-BB9E-30B86BCB1FB7@bahnhof.se> References: <1317264669.2762677.1678192549210@email.ionos.de> <40E26F97-2412-4BB6-8056-A628D7E5200E@bahnhof.se> <046e15c0-8015-10d6-8720-b1ee3cc431e7@ix.netcom.com> <36980084-ad59-d8f2-378d-35116bb0d770@ix.netcom.com> <48B734CC-B191-4B80-BB9E-30B86BCB1FB7@bahnhof.se> Message-ID: On 4/1/2023 12:26 PM, Kent Karlsson wrote: > > >> 25 mars 2023 kl. 22:34 skrev Asmus Freytag via Unicode >> : >> >>> >for unit symbols. When used with temperature, it's interesting to note >>> that not all temperature >>> >scales use it consistently. You don't see it with Fahrenheit very often, >>> for example, reflecting >>> >differences in traditional keyboard layouts. >>> Ok, let?s digress a bit? I do see that too, in news articles (in web >>> apps) from USA and British news >>> companies and see also ?C? when degrees Celsius is meant. But >>> writing farad (F) or coulomb (C) >>> when referring to temperature is just horrible, and only >>> embarrassing for the journalist who wrote >>> that. (Another related horror is ?kph?, and there you cannot even >>> blame keyboard layouts.) >> >> I think it goes a bit too far to assume that any and all unit >> abbreviations have to be in the SI notation always. >> > Unit designations are not abbreviations. They are mathematical > symbols, like ?lim?, ?sin?, ?arctan?, ???, ?+?, ?24?, ? And they > participate in arithmetic/math expressions, like division, > multiplication (including integer powers), and multiplication with > numerical constants (like ?42.5?) and variables. While it is > permissible to, say, write ?sns? instead of ?sin? for the sinus > function, it is not a great idea to do so. Even worse would be to > write ?lim? instead of ?sin? for the sinus function (which is directly > comparable to the example with units above). In scientific usage. > > The SI symbols are standardized, in a standard I?d say is the most > important in the modern world. More important than Unicode/10646? It > is a bit odd that representatives of a rather important IT standard > does not recognize the importance of the most important, overall, > standard in the world today (and for foreseeable future). It is a standard, but that doesn't mean that it's the only standard for expressing weights and measures. There may be jurisdictions where it's illegal to use another standard in a commercial or legal context, but that still doesn't mean that other conventions don't exist whether instead or in parallel to the SI. Those details depend on the jurisdiction. >> I'm sure there are places where there are regulations that define the >> use of specific abbreviations and in any contexts where they apply to >> SI, you would be free to read "k" as kilo and "kph" as kilo-ph (and >> then reject that as undefined). The same is not true for ordinary >> everyday usage in places where SI units aren't customary. >> >> Likewise, the "ph" suffix to mean "per hour" is well established in >> places, while "/h" is not. That said, given that usage, I'd >> personally prefer kmph over kph. >> > I have never seen ?p? as the division operator in a properly formed > mathematical expression. Permitted? Yes, of course. But ?rare and > unusual?, and definitely a bad idea. It's definitely common and the accepted usage for at least 330 million people. > >> For example, in the weather forecast, 80F never refers to capacity, >> is understood by the audience, and therefore there's no objection to >> that usage on ground of confusion with SI units. > > That is not it. The thing is to keep with conventional, even > standardised, notation or not. The use of 80F is totally "standard" in those places where it is used. No amount of wishing something else will change that. (I suspect that there are standards that codify that usage, but I'm not willing to dig that deep for the purpose of this discussion: it's enough that it's a commonly used convention). > Even when keeping with conventional notation for units, there are > ambiguities that need to be resolved by context: e.g. is B bel or > byte? With a prefix (at a minimum, usually there is much more > context), the ambiguity is resolved, there are no decibyte nor any > megabinarybel (theoretically, there is megabel, but in practice not). > If widening to chemistry, B could stand for boron; while not a unit of > measure, it will have the same style as a (properly written) unit > symbol. But there is no need to diverge from standard (pun intended) > practice just because resolution by context is possible for the > divergent notation. We are not discussing someone creating an alternate system just to be different. We are talking about alternate systems that exist, are widely understood among its users and that there are jurisdictions that have decided not to mandate the use of SI (except for selected purposes). > > Your example is just like if one were to use V for hydrogen, just > because one would know from a particular context that hydrogen is > meant and not vanadium. I think if anyone were to write V and mean > hydrogen, there would be heavy criticism. > No, the example would be if Germany had adopted "S" for "Sauerstoff" instead of "O". This did not happen, for various reasons, but in theory you can have a translated system of element names that is as self-consistent as the standard one. (Just like some countries translate character names in 10646). > True, people may err in various ways. But I?m referring to people who > should know better, like w.r.t. temperature: journalists and weather > presenters. There is no "better" here. Presenting weather forecasts in temperatures in centigrade to an American audience is pretty near useless. > > (Nit: I?ve even seen F? and C?, which looks even more, ummm, > uneducated. And that in a worldwide well-known weather app. Go find! > :-) They use technology from another company that gives prognosis for > the amount of dandruff in the air? (but not in English). To make the > whole thing even more comical, they pride themselves of being sooo > accurate? Two all-year-round April?s fools jokes :-).) > >> However, usage is not consistent, you see it both with and without >> the degree sign, and without naming names, websites by academic >> institutions are just as likely to leave it off as popular websites >> are likely to add it. >> >> As you can see, actual usage is all over the place and as Unicode is >> not prescriptive, we simply deal with what's out there. > > But you do complain when things ?out there? are not up to par. Like in > the ZWJ discussion not long ago. There's a difference between being "descriptive" in your encoding (describing how certain text elements map to encoded character sequences whether or not these elements are preferred in any "prescriptive" system), and being prescriptive in what certain encoded elements are intended for. A./ > > /Kent K > -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Sun Apr 2 06:50:01 2023 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Sun, 2 Apr 2023 20:50:01 +0900 Subject: The Spread of Writing: Every Year Message-ID: <0a94e240-3835-9570-6140-928ae1d490be@it.aoyama.ac.jp> Hello character and script enthusiasts, I'm not sure how accurate this is (surely leaves out a lot of minor scripts), but found this interesting: https://www.youtube.com/watch?v=eUpJ4yVCNrI From t0dd at protonmail.com Mon Apr 17 14:50:37 2023 From: t0dd at protonmail.com (t0dd) Date: Mon, 17 Apr 2023 19:50:37 +0000 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis Message-ID: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> Hello all, Narrative writers working in the English language, and in particular the US (I can't speak for the rest of the English-language world), are generally required to adhere to the Chicago Manual of Style (CMoS) when submitting manuscripts and screenplays for publication. News people generally follow the AP (Associated Press) style. The rub: they each use a different ellipsis. The CMoS requires three dots spaced apart. The AP, because news copy is space-conscious, requires dots tightly packed. Other style guides follow one or the other, but most follow the the Chicago style or they are indifferent. For example, in school many of you were required to follow the MLA style guide. That also requires a spaced-out "Chicago" ellipsis (I am just going to call it that from here on out). Conversely, if you wrote for the Psychology Review, you follow the APA style which adheres to the "AP" ellipsis. Unicode only supplies one horizontal ellipsis: U+2026. The AP ellipsis. This ellipsis is constructed via three periods with no additional spacing: U+002E U+002 EU+002E under the covers. (Spaces between the codes here have been added for readability.) That construction is not sufficient. Ironically, the most commonly needed ellipsis is not the one defined by Unicode. The more common need is for something constructed with three-periods separated by three non-breaking-spaces. I.e., something like U+002E U+00A0 U+002E U+00A0 U+002E. Again, treated as a solitary character and unbreakable. And, of course there are repercussions if it lives next to a sentence-ending period, or if it is adjacent to a quotation mark. Etc. What most writers do to get around this issue is find-and-replace all ellipsis characters with three periods spaced out. But that doesn't word wrap correctly. Slightly more savvy writers find-and-replace all ellipsis characters with three periods separated by a non-breaking space (see above). Or they change the character spacing style within their word-processing application for their three-period "word". Or they just use the AP ellipsis and hope no one cares. It should be noted that grammar and spell checkers see these user-generated constructions as errors. This is ugly. There really needs to be a Unicode character that supports the Chicago ellipsis. None of the word processing packages builds any robust workaround for this. LaTeX has an ellipsis package to work around this and the associated complexities (https://tug.ctan.org/macros/latex/contrib/ellipsis/ellipsis.pdf is really worth the read), but that's not ideal. LaTeX is not software designed for the Everyman. I hear rumor that some typefaces come with stylistic alternatives to address this, but that's not the case with any typeface that I have ever had to use as required by a publisher (namely New Times Roman). Plus, that's . . . kludgy. So . . . please. Someone. Advocate for supporting a spaced-out ellipsis so that all of us who have to adhere to a standard that is not the AP Style don't have to do bizarre find-and-replacey things or other workaronds. Newspapers are dead, haven't you heard? ? We all have access to an em-dashes and en-dashes and other dashes. A Chicago-styled ellipsis (for lack of a better nomenclature) is way way overdue IMHO What think y'all? Note, I just joined the mailing list in order to voice this. Be kind please. :) Cheers. -t P.S. NOTE: This topic has been touched upon a bit in the past, but not quite exactly the same ask. (Reference: https://www.unicode.org/mail-arch/unicode-ml/y2006-m01/0164.html) That thread devolved into lovely poetry. Worth the read. ;) I digress . . . -- t0dd -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 249 bytes Desc: OpenPGP digital signature URL: From asmusf at ix.netcom.com Mon Apr 17 15:56:56 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 17 Apr 2023 13:56:56 -0700 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> Message-ID: <32e91881-40af-979a-1444-4ea50794cda3@ix.netcom.com> Given the facts as stated, the conclusion would be that this should be proposed for a variation sequence. Logically, that is the best alternative when there needs to a be distinct appearance that is selectable by context, but where that context runs orthogonal to font selection. The other consideration is that the two forms of the ellipsis are both otherwise identical in meaning. That is, if I cut and paste text from a news story into another document, the meaning doesn't change. Going in and adding/removing variation selectors would fine tune the appearance, which is what is desired, not change the nature of the punctuation. The only question would be whether to standardize two sequences or only one. If two are defined, the original character (if not part of a variation sequence) would have no preferred rendition. In order for any action to be taken, this would need to be written up as a proposal and submitted. A./ PS: this is not so different from cases like upright vs. slanted integral signs in math. For those and similar examples, the Standard recognizes that the duplicating character codes would imply differences in semantics and that the choice needs to be made without the need to replace the entire font. Hence the solution of standardizing a variation sequence. On 4/17/2023 12:50 PM, t0dd via Unicode wrote: > Hello all, > > Narrative writers working in the English language, and in particular > the US (I can't speak for the rest of the English-language world), are > generally required to adhere to the Chicago Manual of Style (CMoS) > when submitting manuscripts and screenplays for publication. News > people generally follow the AP (Associated Press) style. The rub: they > each use a different ellipsis. The CMoS requires three dots spaced > apart. The AP, because news copy is space-conscious, requires dots > tightly packed. > > Other style guides follow one or the other, but most follow the the > Chicago style or they are indifferent. For example, in school many of > you were required to follow the MLA style guide. That also requires a > spaced-out "Chicago" ellipsis (I am just going to call it that from > here on out). Conversely, if you wrote for the Psychology Review, you > follow the APA style which adheres to the "AP" ellipsis. > > Unicode only supplies one horizontal ellipsis: U+2026. The AP > ellipsis. This ellipsis is constructed via three periods with no > additional spacing: U+002E U+002 EU+002E under the covers. (Spaces > between the codes here have been added for readability.) > > That construction is not sufficient. Ironically, the most commonly > needed ellipsis is not the one defined by Unicode. The more common > need is for something constructed with three-periods separated by > three non-breaking-spaces. I.e., something like U+002E U+00A0 U+002E > U+00A0 U+002E. Again, treated as a solitary character and unbreakable. > And, of course there are repercussions if it lives next to a > sentence-ending period, or if it is adjacent to a quotation mark. Etc. > > What most writers do to get around this issue is find-and-replace all > ellipsis characters with three periods spaced out. But that doesn't > word wrap correctly. Slightly more savvy writers find-and-replace all > ellipsis characters with three periods separated by a non-breaking > space (see above). Or they change the character spacing style within > their word-processing application for their three-period "word". Or > they just use the AP ellipsis and hope no one cares. > > It should be noted that grammar and spell checkers see these > user-generated constructions as errors. > > This is ugly. There really needs to be a Unicode character that > supports the Chicago ellipsis. > > None of the word processing packages builds any robust workaround for > this. LaTeX has an ellipsis package to work around this and the > associated complexities > (https://tug.ctan.org/macros/latex/contrib/ellipsis/ellipsis.pdf is > really worth the read), but that's not ideal. LaTeX is not software > designed for the Everyman. > > I hear rumor that some typefaces come with stylistic alternatives to > address this, but that's not the case with any typeface that I have > ever had to use as required by a publisher (namely New Times Roman). > Plus, that's . . . kludgy. > > So . . . please. Someone. Advocate for supporting a spaced-out > ellipsis so that all of us who have to adhere to a standard that is > not the AP Style don't have to do bizarre find-and-replacey things or > other workaronds. Newspapers are dead, haven't you heard? ? > > We all have access to an em-dashes and en-dashes and other dashes. A > Chicago-styled ellipsis (for lack of a better nomenclature) is way way > overdue IMHO > > What think y'all? Note, I just joined the mailing list in order to > voice this. Be kind please. :) > > Cheers. -t > > > P.S. NOTE: This topic has been touched upon a bit in the past, but not > quite exactly the same ask. (Reference: > https://www.unicode.org/mail-arch/unicode-ml/y2006-m01/0164.html) That > thread devolved into lovely poetry. Worth the read. ;) I digress . . . > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenwhistler at sonic.net Mon Apr 17 17:37:43 2023 From: kenwhistler at sonic.net (Ken Whistler) Date: Mon, 17 Apr 2023 15:37:43 -0700 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: <32e91881-40af-979a-1444-4ea50794cda3@ix.netcom.com> References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> <32e91881-40af-979a-1444-4ea50794cda3@ix.netcom.com> Message-ID: <163e55a1-4734-1093-6d18-cc9248d339d4@sonic.net> Asmus, I'm gonna disagree. Adding a variation sequence would just confuse existing practice, and wouldn't deal with the edge cases where the spaced-out ellipses bump into other punctuation. See a more nuanced discussion of the issue at: https://cmosshoptalk.com/2019/07/30/dot-dot-dot-a-closer-look-at-the-ellipsis/ Basically, an ellipsis is an ellipsis is an ellipsis, sure, but when one gets to concerns about exact appearance in a publication, that becomes a copyedit issue, and standard practice is simply to insert the NBSP (or NNBSP, depending on preference) to space dots out to match the spec and prevent unwanted line breaks. It may be a bit of a PITA for somebody who uses ellipses in text to have to insert NBSP in some instances to follow the style guide, but as a copyedit issue, that basically falls into the same category, in my reckoning, as worrying about whether the periods are inside or outside of the quotation marks, for example. One should not assume that plain text poured into a text renderer is automatically going to follow every last detail of a style guide such as CMOS. Preparation for publication assumes markup for styling, of course, but may also require specialized handling for hyphenation (or prevention thereof) *and* attention to detail of spacing that might not be entirely handled automatically by a generic renderer. So I'm not in favor of getting variation selectors involved here as well, which would likely just introduce more distinctions that wouldn't always work as expected but would likely require more hacky overrides in edge cases if used. --Ken On 4/17/2023 1:56 PM, Asmus Freytag via Unicode wrote: > Given the facts as stated, the conclusion would be that this should be > proposed for a variation sequence. > From asmusf at ix.netcom.com Mon Apr 17 23:13:58 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 17 Apr 2023 21:13:58 -0700 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: <163e55a1-4734-1093-6d18-cc9248d339d4@sonic.net> References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> <32e91881-40af-979a-1444-4ea50794cda3@ix.netcom.com> <163e55a1-4734-1093-6d18-cc9248d339d4@sonic.net> Message-ID: There are two arguments that would prevent either a new code point or a variation sequence from being a complete solution. One is the case of monospaced fonts, were most likely neither would result in something that's wider than a standard cell. (Although some people design fonts that are only "mostly" monospaced). While we are at the monospaced fonts, they don't work well with inserted spaces, as the gaps are already present. In fact it's clear that the CoMS is fairly wedded to ensuring that you get the same effect as typing three dots on a typewriter... Typewriters and monospaced fonts would get the desired effect automatically also for adjacent punctuation. It is only when dealing with variable width fonts that one would need to manually add spaces in order to achieve the typewritten appearance. It would seem in this light, that our discussion of Ellipsis? (p. 277 in 15.0) is incomplete. Because, following your argument, we would argue that the encoding of an ellipsis as a sequence of alternating U+002E with NBSP or NNBSP is something we recommend (based on the desired spacing) even when, semantically, the sequence is fully an equivalent of what is otherwise encoded by the ellipsis. I would suggest changing the following sentence to better map out the conventions. "For example, in a monowidth font, a sequence of threefull stopswill be wider than thehorizontal ellipsis, but in a typical proportional font, afull stopis very narrow and a sequence of three of them will be more tightly spaced than the dots inhorizontal ellipsis." In a monowidth font, a sequence of three/full stops/will be wider than /horizontal ellipsis/, and may be the appropriate when following style guides that require more widely spaced dots. In this case, the spacing between the last dot and following punctuation would be as expected. In contrast, for typical proportional font, a/full stop/is very narrow and a sequence of three of them will be more tightly spaced than the dots in/horizontal ellipsis/. When following style guided calling for more widely spaced dots, established practice calls for separating the dots (and any surrounding punctuation) by either a NBSP or NNBSP. A./ On 4/17/2023 3:37 PM, Ken Whistler via Unicode wrote: > Asmus, > > I'm gonna disagree. Adding a variation sequence would just confuse > existing practice, and wouldn't deal with the edge cases where the > spaced-out ellipses bump into other punctuation. See a more nuanced > discussion of the issue at: > > https://cmosshoptalk.com/2019/07/30/dot-dot-dot-a-closer-look-at-the-ellipsis/ > > > Basically, an ellipsis is an ellipsis is an ellipsis, sure, but when > one gets to concerns about exact appearance in a publication, that > becomes a copyedit issue, and standard practice is simply to insert > the NBSP (or NNBSP, depending on preference) to space dots out to > match the spec and prevent unwanted line breaks. It may be a bit of a > PITA for somebody who uses ellipses in text to have to insert NBSP in > some instances to follow the style guide, but as a copyedit issue, > that basically falls into the same category, in my reckoning, as > worrying about whether the periods are inside or outside of the > quotation marks, for example. > > One should not assume that plain text poured into a text renderer is > automatically going to follow every last detail of a style guide such > as CMOS. Preparation for publication assumes markup for styling, of > course, but may also require specialized handling for hyphenation (or > prevention thereof) *and* attention to detail of spacing that might > not be entirely handled automatically by a generic renderer. > > So I'm not in favor of getting variation selectors involved here as > well, which would likely just introduce more distinctions that > wouldn't always work as expected but would likely require more hacky > overrides in edge cases if used. > > --Ken > > On 4/17/2023 1:56 PM, Asmus Freytag via Unicode wrote: >> Given the facts as stated, the conclusion would be that this should >> be proposed for a variation sequence. >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at kli.org Tue Apr 18 13:43:39 2023 From: mark at kli.org (Mark E. Shoulson) Date: Tue, 18 Apr 2023 14:43:39 -0400 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> Message-ID: <8bcd9aca-9ea5-6966-06cd-1f2df19f968f@shoulson.com> I'm not entirely sure what the fuss is about.? As Ken said, "an ellipsis is an ellipsis is an ellipsis," and I have to agree with that.? U+2026 HORIZONTAL ELLIPSIS is only U+002E U+002E U+002E in *compatibility* decomposition, as far as I can see, and compatibility was never meant to cover fine points of typography. If you need an ellipsis that "looks" more bunched-up or more stretched-out, those are glyph variants, not even variation sequences, right?? And besides, when actually typeset well, I think there isn't much difference as to what a well-typeset ellipsis should look like for a given font (that is, when looking at a book, on paper, do CMoS fans expect ellipses to be that much more spaced out that AP fans?? Do they really look wrong to each other?)? Maybe they do, as you speak about "submitting manuscripts." But I'm still not getting it.? When you say that the AP requires tightly-packed dots because news copy is space-conscious, that has to do with how they PRINT things, right?? It doesn't matter how loosely kerned the dots are in a reporter's print-out, because that's not what takes up space in their columns: it's what they *print* that needs to be space-conscious. So it sounds like the issue here mainly involves the electronic submission of manuscripts, when you email plain text in to the AP or a publisher.? There's no such thing as "plain text" once it's on paper; print is as print looks.? So in electronic correspondence, the AP prefers you write an ellipsis as U+002E U+002E U+002E, while the CMoS standard says you should use U+002E U+00A0 U+002E U+00A0 U+002E.? Is this right, so far? In that case, *neither one* seems to be asking for U+2026 HORIZONTAL ELLIPSIS.? This sounds like two standards for "how to write an ellipsis when all you have is periods and spaces."? If you have an actual ellipsis character, then either standard can easily accept it whether it looks all crunched up or all spaced out or like a sparkly unicorn (how it looks is a matter for a font to determine, not "plain text.")? Or either standard can decide not to accept it at all. To be sure, this makes for some weirdness, when you have a monospace font, which by definition means that "all characters must take up the same width."? If you consider an ellipsis to be a single character, then, yes, you'll get a horribly crunched-up ellipsis no matter whose standard you prefer.? But that's what you get for taking a symbol that's designed to be wide and forcing it to conform. IOW, I don't see this is something to do with Unicode.? At best it's a glyph variant, if even that. ~mark On 4/17/23 15:50, t0dd via Unicode wrote: > Hello all, > > Narrative writers working in the English language, and in particular > the US (I can't speak for the rest of the English-language world), are > generally required to adhere to the Chicago Manual of Style (CMoS) > when submitting manuscripts and screenplays for publication. News > people generally follow the AP (Associated Press) style. The rub: they > each use a different ellipsis. The CMoS requires three dots spaced > apart. The AP, because news copy is space-conscious, requires dots > tightly packed. > > Other style guides follow one or the other, but most follow the the > Chicago style or they are indifferent. For example, in school many of > you were required to follow the MLA style guide. That also requires a > spaced-out "Chicago" ellipsis (I am just going to call it that from > here on out). Conversely, if you wrote for the Psychology Review, you > follow the APA style which adheres to the "AP" ellipsis. > > Unicode only supplies one horizontal ellipsis: U+2026. The AP > ellipsis. This ellipsis is constructed via three periods with no > additional spacing: U+002E U+002 EU+002E under the covers. (Spaces > between the codes here have been added for readability.) > > That construction is not sufficient. Ironically, the most commonly > needed ellipsis is not the one defined by Unicode. The more common > need is for something constructed with three-periods separated by > three non-breaking-spaces. I.e., something like U+002E U+00A0 U+002E > U+00A0 U+002E. Again, treated as a solitary character and unbreakable. > And, of course there are repercussions if it lives next to a > sentence-ending period, or if it is adjacent to a quotation mark. Etc. > > What most writers do to get around this issue is find-and-replace all > ellipsis characters with three periods spaced out. But that doesn't > word wrap correctly. Slightly more savvy writers find-and-replace all > ellipsis characters with three periods separated by a non-breaking > space (see above). Or they change the character spacing style within > their word-processing application for their three-period "word". Or > they just use the AP ellipsis and hope no one cares. > > It should be noted that grammar and spell checkers see these > user-generated constructions as errors. > > This is ugly. There really needs to be a Unicode character that > supports the Chicago ellipsis. > > None of the word processing packages builds any robust workaround for > this. LaTeX has an ellipsis package to work around this and the > associated complexities > (https://tug.ctan.org/macros/latex/contrib/ellipsis/ellipsis.pdf is > really worth the read), but that's not ideal. LaTeX is not software > designed for the Everyman. > > I hear rumor that some typefaces come with stylistic alternatives to > address this, but that's not the case with any typeface that I have > ever had to use as required by a publisher (namely New Times Roman). > Plus, that's . . . kludgy. > > So . . . please. Someone. Advocate for supporting a spaced-out > ellipsis so that all of us who have to adhere to a standard that is > not the AP Style don't have to do bizarre find-and-replacey things or > other workaronds. Newspapers are dead, haven't you heard? ? > > We all have access to an em-dashes and en-dashes and other dashes. A > Chicago-styled ellipsis (for lack of a better nomenclature) is way way > overdue IMHO > > What think y'all? Note, I just joined the mailing list in order to > voice this. Be kind please. :) > > Cheers. -t > > > P.S. NOTE: This topic has been touched upon a bit in the past, but not > quite exactly the same ask. (Reference: > https://www.unicode.org/mail-arch/unicode-ml/y2006-m01/0164.html) That > thread devolved into lovely poetry. Worth the read. ;) I digress . . . > > From asmusf at ix.netcom.com Tue Apr 18 16:22:00 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 18 Apr 2023 14:22:00 -0700 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: <8bcd9aca-9ea5-6966-06cd-1f2df19f968f@shoulson.com> References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> <8bcd9aca-9ea5-6966-06cd-1f2df19f968f@shoulson.com> Message-ID: <93160574-b365-85a8-e36d-db3baf515661@ix.netcom.com> On 4/18/2023 11:43 AM, Mark E. Shoulson via Unicode wrote: > I'm not entirely sure what the fuss is about.? As Ken said, "an > ellipsis is an ellipsis is an ellipsis," and I have to agree with > that.? U+2026 HORIZONTAL ELLIPSIS is only U+002E U+002E U+002E in > *compatibility* decomposition, as far as I can see, and compatibility > was never meant to cover fine points of typography. If you need an > ellipsis that "looks" more bunched-up or more stretched-out, those are > glyph variants, not even variation sequences, right?? And besides, > when actually typeset well, I think there isn't much difference as to > what a well-typeset ellipsis should look like for a given font (that > is, when looking at a book, on paper, do CMoS fans expect ellipses to > be that much more spaced out that AP fans?? Do they really look wrong > to each other?)? Maybe they do, as you speak about "submitting > manuscripts." If you drill down, you find that the concept of an "ellipsis" and the abstract character for "horizontal ellipsis" (as currently conceived) are not fully congruent. And, as was reported here, there is established (and recommended) practice of using character sequences. (Plus there are subtle interactions with surrounding punctuation which are beyond simple glyph design variation.) Taken all this together, the preferred action is to document such practice, instead of pretending that the "horizontal ellipsis" covers every possible expression of the concept of an "ellipsis". It happens to work well if you want something that is a single character and has a moderately compact representation, but it's not something that works, or is even preferable in all contexts. This is one of those instances when Unicode can and should be descriptive, instead of being prescriptive. As we speak the text of the standard is being updated to counter the presumption that it's just a matter of always using "horizontal ellipsis" and trusting the glyph design in the selected font. A./ > > But I'm still not getting it.? When you say that the AP requires > tightly-packed dots because news copy is space-conscious, that has to > do with how they PRINT things, right?? It doesn't matter how loosely > kerned the dots are in a reporter's print-out, because that's not what > takes up space in their columns: it's what they *print* that needs to > be space-conscious. > > So it sounds like the issue here mainly involves the electronic > submission of manuscripts, when you email plain text in to the AP or a > publisher.? There's no such thing as "plain text" once it's on paper; > print is as print looks.? So in electronic correspondence, the AP > prefers you write an ellipsis as U+002E U+002E U+002E, while the CMoS > standard says you should use U+002E U+00A0 U+002E U+00A0 U+002E.? Is > this right, so far? > > In that case, *neither one* seems to be asking for U+2026 HORIZONTAL > ELLIPSIS.? This sounds like two standards for "how to write an > ellipsis when all you have is periods and spaces."? If you have an > actual ellipsis character, then either standard can easily accept it > whether it looks all crunched up or all spaced out or like a sparkly > unicorn (how it looks is a matter for a font to determine, not "plain > text.")? Or either standard can decide not to accept it at all. > > To be sure, this makes for some weirdness, when you have a monospace > font, which by definition means that "all characters must take up the > same width."? If you consider an ellipsis to be a single character, > then, yes, you'll get a horribly crunched-up ellipsis no matter whose > standard you prefer.? But that's what you get for taking a symbol > that's designed to be wide and forcing it to conform. > > IOW, I don't see this is something to do with Unicode.? At best it's a > glyph variant, if even that. > > ~mark > > On 4/17/23 15:50, t0dd via Unicode wrote: >> Hello all, >> >> Narrative writers working in the English language, and in particular >> the US (I can't speak for the rest of the English-language world), >> are generally required to adhere to the Chicago Manual of Style >> (CMoS) when submitting manuscripts and screenplays for publication. >> News people generally follow the AP (Associated Press) style. The >> rub: they each use a different ellipsis. The CMoS requires three dots >> spaced apart. The AP, because news copy is space-conscious, requires >> dots tightly packed. >> >> Other style guides follow one or the other, but most follow the the >> Chicago style or they are indifferent. For example, in school many of >> you were required to follow the MLA style guide. That also requires a >> spaced-out "Chicago" ellipsis (I am just going to call it that from >> here on out). Conversely, if you wrote for the Psychology Review, you >> follow the APA style which adheres to the "AP" ellipsis. >> >> Unicode only supplies one horizontal ellipsis: U+2026. The AP >> ellipsis. This ellipsis is constructed via three periods with no >> additional spacing: U+002E U+002 EU+002E under the covers. (Spaces >> between the codes here have been added for readability.) >> >> That construction is not sufficient. Ironically, the most commonly >> needed ellipsis is not the one defined by Unicode. The more common >> need is for something constructed with three-periods separated by >> three non-breaking-spaces. I.e., something like U+002E U+00A0 U+002E >> U+00A0 U+002E. Again, treated as a solitary character and >> unbreakable. And, of course there are repercussions if it lives next >> to a sentence-ending period, or if it is adjacent to a quotation >> mark. Etc. >> >> What most writers do to get around this issue is find-and-replace all >> ellipsis characters with three periods spaced out. But that doesn't >> word wrap correctly. Slightly more savvy writers find-and-replace all >> ellipsis characters with three periods separated by a non-breaking >> space (see above). Or they change the character spacing style within >> their word-processing application for their three-period "word". Or >> they just use the AP ellipsis and hope no one cares. >> >> It should be noted that grammar and spell checkers see these >> user-generated constructions as errors. >> >> This is ugly. There really needs to be a Unicode character that >> supports the Chicago ellipsis. >> >> None of the word processing packages builds any robust workaround for >> this. LaTeX has an ellipsis package to work around this and the >> associated complexities >> (https://tug.ctan.org/macros/latex/contrib/ellipsis/ellipsis.pdf is >> really worth the read), but that's not ideal. LaTeX is not software >> designed for the Everyman. >> >> I hear rumor that some typefaces come with stylistic alternatives to >> address this, but that's not the case with any typeface that I have >> ever had to use as required by a publisher (namely New Times Roman). >> Plus, that's . . . kludgy. >> >> So . . . please. Someone. Advocate for supporting a spaced-out >> ellipsis so that all of us who have to adhere to a standard that is >> not the AP Style don't have to do bizarre find-and-replacey things or >> other workaronds. Newspapers are dead, haven't you heard? ? >> >> We all have access to an em-dashes and en-dashes and other dashes. A >> Chicago-styled ellipsis (for lack of a better nomenclature) is way >> way overdue IMHO >> >> What think y'all? Note, I just joined the mailing list in order to >> voice this. Be kind please. :) >> >> Cheers. -t >> >> >> P.S. NOTE: This topic has been touched upon a bit in the past, but >> not quite exactly the same ask. (Reference: >> https://www.unicode.org/mail-arch/unicode-ml/y2006-m01/0164.html) >> That thread devolved into lovely poetry. Worth the read. ;) I digress >> . . . >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jukkakk at gmail.com Sat Apr 22 09:29:17 2023 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Sat, 22 Apr 2023 17:29:17 +0300 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> Message-ID: CMoS describes the ellipsis as ?spaced dots? for English, but it also describes the practice of ?unspaced dots? for other languages. Apparently, organizations and people may prefer unspaced dots for English, and probably also spaced dots for languages where unspaced dots are normal. Some (many?) languages have no standard on such issues. A simple interpretation is that ?spaced dots? is U+2026 and ?unspaced dots? is U+002E U+002E U+002E, This way, the two practices can be expressed at the character level. This is significant in a text containing multiple languages, because it would be normal to preserve language-dependent style of punctuation (like we do for various quotations marks, for example). I think the Unicode Standard is somewhat vague in this issue, possibly trying to cover a wider range of variation in the typography of ellipsis: ?U+2026 horizontal ellipsis is the ordinary Unicode character intended for the representation of an ellipsis in text and typically shows the dots separated with a moderate degree of spacing.? The word ?moderate? might be read as suggesting that it?s not really sufficient for spaced dots but too much for unspaced dots, i.e. a compromise or neutral position that is not suitable for either style. Perhaps it should say more clearly that U+2026 is expected to be rendered as spaced dots and that a sequence of three U+002E is expected to be renders as unspaced dots. More detailed tuning of spacing is typographic and not a Unicode issue, Jukka https://jkorpela.fi/ ma 17. huhtik. 2023 klo 23.00 t0dd via Unicode (unicode at corp.unicode.org) kirjoitti: > Hello all, > > Narrative writers working in the English language, and in particular the > US (I can't speak for the rest of the English-language world), are > generally required to adhere to the Chicago Manual of Style (CMoS) when > submitting manuscripts and screenplays for publication. News people > generally follow the AP (Associated Press) style. The rub: they each use > a different ellipsis. The CMoS requires three dots spaced apart. The AP, > because news copy is space-conscious, requires dots tightly packed. > > Other style guides follow one or the other, but most follow the the > Chicago style or they are indifferent. For example, in school many of > you were required to follow the MLA style guide. That also requires a > spaced-out "Chicago" ellipsis (I am just going to call it that from here > on out). Conversely, if you wrote for the Psychology Review, you follow > the APA style which adheres to the "AP" ellipsis. > > Unicode only supplies one horizontal ellipsis: U+2026. The AP ellipsis. > This ellipsis is constructed via three periods with no additional > spacing: U+002E U+002 EU+002E under the covers. (Spaces between the > codes here have been added for readability.) > > That construction is not sufficient. Ironically, the most commonly > needed ellipsis is not the one defined by Unicode. The more common need > is for something constructed with three-periods separated by three > non-breaking-spaces. I.e., something like U+002E U+00A0 U+002E U+00A0 > U+002E. Again, treated as a solitary character and unbreakable. And, of > course there are repercussions if it lives next to a sentence-ending > period, or if it is adjacent to a quotation mark. Etc. > > What most writers do to get around this issue is find-and-replace all > ellipsis characters with three periods spaced out. But that doesn't word > wrap correctly. Slightly more savvy writers find-and-replace all > ellipsis characters with three periods separated by a non-breaking space > (see above). Or they change the character spacing style within their > word-processing application for their three-period "word". Or they just > use the AP ellipsis and hope no one cares. > > It should be noted that grammar and spell checkers see these > user-generated constructions as errors. > > This is ugly. There really needs to be a Unicode character that supports > the Chicago ellipsis. > > None of the word processing packages builds any robust workaround for > this. LaTeX has an ellipsis package to work around this and the > associated complexities > (https://tug.ctan.org/macros/latex/contrib/ellipsis/ellipsis.pdf is > really worth the read), but that's not ideal. LaTeX is not software > designed for the Everyman. > > I hear rumor that some typefaces come with stylistic alternatives to > address this, but that's not the case with any typeface that I have ever > had to use as required by a publisher (namely New Times Roman). Plus, > that's . . . kludgy. > > So . . . please. Someone. Advocate for supporting a spaced-out ellipsis > so that all of us who have to adhere to a standard that is not the AP > Style don't have to do bizarre find-and-replacey things or other > workaronds. Newspapers are dead, haven't you heard? ? > > We all have access to an em-dashes and en-dashes and other dashes. A > Chicago-styled ellipsis (for lack of a better nomenclature) is way way > overdue IMHO > > What think y'all? Note, I just joined the mailing list in order to voice > this. Be kind please. :) > > Cheers. -t > > > P.S. NOTE: This topic has been touched upon a bit in the past, but not > quite exactly the same ask. (Reference: > https://www.unicode.org/mail-arch/unicode-ml/y2006-m01/0164.html) That > thread devolved into lovely poetry. Worth the read. ;) I digress . . . > > > -- > t0dd > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sat Apr 22 18:07:35 2023 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 22 Apr 2023 16:07:35 -0700 Subject: Chicago/MLA ellipsis versus the Unicode defined AP ellipsis In-Reply-To: References: <682eadc0-87bc-71b0-9144-fadba65ee8c2@protonmail.com> Message-ID: When copy editors tell you (on their websites) that they insist on the use sequences to meet their style requirements, then that's good enough for me to settle what a _descriptive _approach to this should be. Which is to say that once you want specific control of spacing, use of a sequence is established practice. And we are in the process of acknowledging that in the text of the standard. A./ On 4/22/2023 7:29 AM, Jukka K. Korpela via Unicode wrote: > CMoS describes the ellipsis as ?spaced dots? for English, but it also > describes the practice of ?unspaced dots? for other languages. > Apparently, organizations and people?may prefer unspaced dots for > English, and probably also spaced dots for languages where unspaced > dots are normal. Some (many?) languages have no standard on such issues. > > A simple interpretation is that ?spaced dots? is U+2026 and ?unspaced > dots? is U+002E U+002E U+002E, This way, the two practices can be > expressed at the character level. This is significant in a text > containing multiple languages, because?it would be normal to preserve > language-dependent style of punctuation (like we do for various > quotations marks, for example). > > I think the Unicode Standard is somewhat vague in this issue, possibly > trying to cover a wider range of variation in the typography?of > ellipsis: ?U+2026 horizontal ellipsis is the ordinary Unicode > character intended for the representation > of an ellipsis in text and typically shows the dots separated with a > moderate degree of spacing.? The word ?moderate? might be read as > suggesting that it?s not really sufficient for spaced dots but too > much for unspaced dots, i.e. a compromise or neutral position that is > not suitable for either style. Perhaps it should say more clearly that > U+2026 is expected to be rendered as spaced dots and that a sequence > of three U+002E is expected to be renders as unspaced dots. More > detailed tuning of spacing is typographic and not a Unicode issue, > > Jukka > https://jkorpela.fi/ > > ma 17. huhtik. 2023 klo 23.00 t0dd via Unicode > (unicode at corp.unicode.org) kirjoitti: > > Hello all, > > Narrative writers working in the English language, and in > particular the > US (I can't speak for the rest of the English-language world), are > generally required to adhere to the Chicago Manual of Style (CMoS) > when > submitting manuscripts and screenplays for publication. News people > generally follow the AP (Associated Press) style. The rub: they > each use > a different ellipsis. The CMoS requires three dots spaced apart. > The AP, > because news copy is space-conscious, requires dots tightly packed. > > Other style guides follow one or the other, but most follow the the > Chicago style or they are indifferent. For example, in school many of > you were required to follow the MLA style guide. That also requires a > spaced-out "Chicago" ellipsis (I am just going to call it that > from here > on out). Conversely, if you wrote for the Psychology Review, you > follow > the APA style which adheres to the "AP" ellipsis. > > Unicode only supplies one horizontal ellipsis: U+2026. The AP > ellipsis. > This ellipsis is constructed via three periods with no additional > spacing: U+002E U+002 EU+002E under the covers. (Spaces between the > codes here have been added for readability.) > > That construction is not sufficient. Ironically, the most commonly > needed ellipsis is not the one defined by Unicode. The more common > need > is for something constructed with three-periods separated by three > non-breaking-spaces. I.e., something like U+002E U+00A0 U+002E U+00A0 > U+002E. Again, treated as a solitary character and unbreakable. > And, of > course there are repercussions if it lives next to a sentence-ending > period, or if it is adjacent to a quotation mark. Etc. > > What most writers do to get around this issue is find-and-replace all > ellipsis characters with three periods spaced out. But that > doesn't word > wrap correctly. Slightly more savvy writers find-and-replace all > ellipsis characters with three periods separated by a non-breaking > space > (see above). Or they change the character spacing style within their > word-processing application for their three-period "word". Or they > just > use the AP ellipsis and hope no one cares. > > It should be noted that grammar and spell checkers see these > user-generated constructions as errors. > > This is ugly. There really needs to be a Unicode character that > supports > the Chicago ellipsis. > > None of the word processing packages builds any robust workaround for > this. LaTeX has an ellipsis package to work around this and the > associated complexities > (https://tug.ctan.org/macros/latex/contrib/ellipsis/ellipsis.pdf is > really worth the read), but that's not ideal. LaTeX is not software > designed for the Everyman. > > I hear rumor that some typefaces come with stylistic alternatives to > address this, but that's not the case with any typeface that I > have ever > had to use as required by a publisher (namely New Times Roman). Plus, > that's . . . kludgy. > > So . . . please. Someone. Advocate for supporting a spaced-out > ellipsis > so that all of us who have to adhere to a standard that is not the AP > Style don't have to do bizarre find-and-replacey things or other > workaronds. Newspapers are dead, haven't you heard? ? > > We all have access to an em-dashes and en-dashes and other dashes. A > Chicago-styled ellipsis (for lack of a better nomenclature) is way > way > overdue IMHO > > What think y'all? Note, I just joined the mailing list in order to > voice > this. Be kind please. :) > > Cheers. -t > > > P.S. NOTE: This topic has been touched upon a bit in the past, but > not > quite exactly the same ask. (Reference: > https://www.unicode.org/mail-arch/unicode-ml/y2006-m01/0164.html) > That > thread devolved into lovely poetry. Worth the read. ;) I digress . . . > > > -- > t0dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stepan at isc.org Sat Apr 22 09:56:29 2023 From: stepan at isc.org (=?utf-8?B?xaB0xJtww6FuIEJhbMOhxb5paw==?=) Date: Sat, 22 Apr 2023 14:56:29 +0000 (UTC) Subject: U+1FAC3 Pregnant Man missing in the Table for Emoji With Explicit Gender Appearance in TR51 Message-ID: <511982793.2172187.1682175389542.JavaMail.zimbra@isc.org> Hello, Table for Emoji With Explicit Gender Appearance [1] in section 2.3. Gender should include 'U+1FAC3 pregnant man' in the Male column next to 'U+1F930 pregnant woman'. The table wasn't updated for Emoji 14.0 where U+1FAC3 was added. I submitted this as a comment on the relevant PRI only to realize that its closing date already passed. [1] https://www.unicode.org/reports/tr51/proposed.html#ExplicitGenderApperance -- ?t?p?n Bal??ik