From unicode at unicode.org Wed Apr 1 20:20:09 2020 From: unicode at unicode.org (Karl Williamson via Unicode) Date: Wed, 1 Apr 2020 19:20:09 -0600 Subject: Emoji map of Colorado Message-ID: <5bd4004b-6716-2feb-49d3-4881c3bb1790@khwilliamson.com> https://www.reddit.com/r/Denver/comments/fsmn87/quarantine_boredom_my_emoji_map_of_colorado/?mc_cid=365e908e08&mc_eid=0700c8706b From unicode at unicode.org Thu Apr 2 16:29:42 2020 From: unicode at unicode.org (Doug Ewell via Unicode) Date: Thu, 2 Apr 2020 15:29:42 -0600 Subject: Emoji map of Colorado Message-ID: <000001d60935$d35a3770$7a0ea650$@ewellic.org> Karl Williamson shared: > https://www.reddit.com/r/Denver/comments/fsmn87/quarantine_boredom_my_emoji_map_of_colorado/?mc_cid=365e908e08&mc_eid=0700c8706b It's too bad this was only made available as an image, not as text, which of course it is. -- Doug Ewell | Thornton, CO, US | ewellic.org From costello at mitre.org Fri Apr 17 08:42:34 2020 From: costello at mitre.org (Costello, Roger L.) Date: Fri, 17 Apr 2020 13:42:34 +0000 Subject: Why is tab unaffected by font whereas space is affected? Message-ID: Hi Folks, Suppose I use a text editor and type this string: a b I set the font of that string to Calibri. I see a certain spacing between a and b. Then, I change the font to Courier. I see the spacing between a and b is significantly wider. The width of a space character in Courier is larger than the width of a space character in Calibri. Next, I place a tab character before the string: a b I set the font of the tab character to Calibri. I note the position of the string. Then, I set the font of the tab character to Courier. The string is at the same position. Why is that? Why is the space character affected by font whereas the tab character is not? /Roger From richard.wordingham at ntlworld.com Fri Apr 17 10:09:41 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 17 Apr 2020 16:09:41 +0100 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: References: Message-ID: <20200417160941.11e29ce9@JRWUBU2> On Fri, 17 Apr 2020 13:42:34 +0000 "Costello, Roger L. via Unicode" wrote: > Next, I place a tab character before the string: > > a b > > I set the font of the tab character to Calibri. I note the position > of the string. Then, I set the font of the tab character to Courier. > The string is at the same position. Why is that? Why is the space > character affected by font whereas the tab character is not? In some editors, e.g. Emacs 24.4.2, changing the font does significantly change the position. (For most Emacs modes, one nowadays has to specify the character as U+0009 to enter the tab as a character.) The first point is that tab positions are not well defined. For use with a fixed width font, it makes sense for tab positions to be every so many characters; the natural assumption is that the user may want a character-cell device. With proportional width fonts, this is not possible, and so I believe chaos is to be expected for positioning by tabs. Word processors may be expected to have tab positions defined in points or similar, rather than character widths. Richard. From otto.stolz at uni-konstanz.de Fri Apr 17 11:56:04 2020 From: otto.stolz at uni-konstanz.de (Otto Stolz) Date: Fri, 17 Apr 2020 18:56:04 +0200 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: References: Message-ID: Namast? (greeting in Sars-Cov-2 times ?), am 2020-04-17 um 15:42 Uhr hat Roger L. Costello geschrieben: > I place a tab character before the string: > a b > I set the font of the tab character to Calibri. I note the position of the string. > Then, I set the font of the tab character to Courier. The string is at the same position. Where is the problem? Imho, the Tab character signals that the next glyph shall be placed at the next tabulator position in the current line, irrespective of the current writing position (and irrespective of the current font). This way, the Tab key has worked since the times of type-writers and steam-computers ?. On the type-writer machine, you could chose the tabulator positions mechanically; in contemporary text processing apps, you will have some sort of paragraph format to set; simple plain-text editors, such as Edlin under Windows, tend to assume their own, fixed tabulator positions. Best wishes, Otto Stolz From wjgo_10009 at btinternet.com Fri Apr 17 09:15:23 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Fri, 17 Apr 2020 15:15:23 +0100 (BST) Subject: QID emoji discussion and emoji encoding more genarally too (from Re: Base character plus tag sequences) Message-ID: There is a new document about the QID Emoji proposal. https://www.unicode.org/L2/L2020/20110-qid-emoji.pdf Also, please remember the following. https://www.youtube.com/watch?v=9ldSVbXbjl4 The review is due to close on Monday 20 April 2020. William Overington Thursday 16 April 2020 ------ Original Message ------ From: "wjgo_10009 at btinternet.com via Unicode" To: unicode at unicode.org Sent: Monday, 2020 Mar 23 At 22:29 Subject: Base character plus tag sequences (from RE: Is the binaryness/textness of a data format a property?) Doug Ewell wrote: > When 137,468 private-use characters aren't enough? In my opinion, a base character plus tag sequence has the potential to be used for many large scale applications for the future. A base character plus tag sequence encoding has the advantage over a Private Use Area encoding (except for a prompt experimental use or for some applications) that the encoding can be unique and thus interoperability is possible amongst people generally. QID emoji is just the very start of applications, some not even dreamed of yet, for which a base character sequence encoding could be used. Once restrictions of the result of a specific encoding of being only allowed to be a fixed image are removed, then new information technology applications will be possible within text streams. There is the QID Emoji Public Review and issues like this can be explored there so that they will be before the Unicode Technical Committee when it assesses the responses to the public review. In my response of Monday 2 March 2020 I put forward an idea that could allow the idea of QID emoji to proceed yet without the disadvantages. No comment after that has been published as of the time of sending this post. https://www.unicode.org/review/pri408/ Whatever your view on whether such ideas should be allowed to flourish and become mainstream in the future I opine that it would be good for there to be more responses to the public review so that as wide a range of views as possible are before the Unicode Technical Committee when it assesses the responses to the public review, not on just QID emoji as such but on whether the underlying method of encoding of a base character and tag character sequence for large sets of items should be encouraged. William Overington Monday 23 March 2020 From eliz at gnu.org Sat Apr 18 14:24:33 2020 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 18 Apr 2020 22:24:33 +0300 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <20200417160941.11e29ce9@JRWUBU2> (message from Richard Wordingham via Unicode on Fri, 17 Apr 2020 16:09:41 +0100) References: <20200417160941.11e29ce9@JRWUBU2> Message-ID: <83a738k366.fsf@gnu.org> > Date: Fri, 17 Apr 2020 16:09:41 +0100 > From: Richard Wordingham via Unicode > > In some editors, e.g. Emacs 24.4.2, changing the font does > significantly change the position. (For most Emacs modes, one nowadays > has to specify the character as U+0009 to enter the tab as a character.) Emacs uses a fixed number of space_width pixels to display a TAB. Since space_width varies with font, so does the width of a TAB. From jonathan.coxhead at gmail.com Sat Apr 18 17:26:13 2020 From: jonathan.coxhead at gmail.com (Jonathan Coxhead) Date: Sat, 18 Apr 2020 15:26:13 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <83a738k366.fsf@gnu.org> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> Message-ID: Surely the reason is that tab is a control character, but space is a printing character. Fonts consist only of printing characters. Cheers ?? (Jonathan, bronze sponsor of ?) On Sat, Apr 18, 2020 at 12:30 PM Eli Zaretskii via Unicode < unicode at unicode.org> wrote: > > Date: Fri, 17 Apr 2020 16:09:41 +0100 > > From: Richard Wordingham via Unicode > > > > In some editors, e.g. Emacs 24.4.2, changing the font does > > significantly change the position. (For most Emacs modes, one nowadays > > has to specify the character as U+0009 to enter the tab as a character.) > > Emacs uses a fixed number of space_width pixels to display a TAB. > Since space_width varies with font, so does the width of a TAB. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sat Apr 18 18:02:38 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sat, 18 Apr 2020 16:02:38 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> Message-ID: An HTML attachment was scrubbed... URL: From eliz at gnu.org Sat Apr 18 21:30:32 2020 From: eliz at gnu.org (Eli Zaretskii) Date: Sun, 19 Apr 2020 05:30:32 +0300 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: (message from Jonathan Coxhead via Unicode on Sat, 18 Apr 2020 15:26:13 -0700) References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> Message-ID: <835zdwjjg7.fsf@gnu.org> > Date: Sat, 18 Apr 2020 15:26:13 -0700 > From: Jonathan Coxhead via Unicode > > Surely the reason is that tab is a control character, but space is a printing character. No. On a GUI display, Emacs shows a TAB as a stretch of white space of a suitable width, not as a string of space characters. From asmusf at ix.netcom.com Sun Apr 19 16:36:16 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 19 Apr 2020 14:36:16 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <835zdwjjg7.fsf@gnu.org> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <835zdwjjg7.fsf@gnu.org> Message-ID: <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> An HTML attachment was scrubbed... URL: From textexin at xencraft.com Sun Apr 19 18:08:42 2020 From: textexin at xencraft.com (Tex) Date: Sun, 19 Apr 2020 16:08:42 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <835zdwjjg7.fsf@gnu.org> <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> Message-ID: <000601d6169f$78f93f20$6aebbd60$@xencraft.com> I always thought the BEL sound should Doppler shift when the font character width expanded, so that it was deeper and lasted longer. Skinny fonts should be more of a chirp. From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Asmus Freytag via Unicode Sent: Sunday, April 19, 2020 2:36 PM To: unicode at unicode.org Subject: Re: Why is tab unaffected by font whereas space is affected? On 4/18/2020 7:30 PM, Eli Zaretskii via Unicode wrote: Date: Sat, 18 Apr 2020 15:26:13 -0700 From: Jonathan Coxhead via Unicode Surely the reason is that tab is a control character, but space is a printing character. No. On a GUI display, Emacs shows a TAB as a stretch of white space of a suitable width, not as a string of space characters. On a GUI display, no whitespace character is a rendered glyph. All are effectively control functions. However, for the spaces, the font would be queried for width and the layout would advance by that width (after it's been adjusted perhaps by processes such as justification). TAB requests that layout proceed from the next tab position beyond the width of the already laid out text. The locations of tab positions are not defined locally, inline, as is the amount to advance for a space character. Instead, there is an explicit or implicit property for the block of text (paragraph). If specific tab positions have not been provided (via user input, document temaplat or whatever), it's common to assume a regular grid of default positions. There is no standard for choosing these, but setting them apart by 5 or 8 times the typical width of a space character seems a common choice. Now, on the typewriter, where this technology originated, the widths of all characters, including spaces were fixed, and except for later advances like the IBM Selectric, there was also only ever a single typeface per machine. Even for plaintext editors, it's possible nowadays to globally change fonts (and sizes), so it's not surprising to see such choices affect the default tab positioning, as it appears to do in Emacs. This behavior is however still a "fallback" to the case of a full-featured text display, where tab stops and fonts can be explicitly set (per paragraph and per font-run respectively). Just because it's proven a useful workaround to scale default tab positions based on font size and typical width (using the space character as a proxy) it does not follow that tab-stop positioning is now a per-font run property. It is not. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.b.karlsson at bahnhof.se Sun Apr 19 19:01:36 2020 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Mon, 20 Apr 2020 02:01:36 +0200 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <83a738k366.fsf@gnu.org> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> Message-ID: > 18 apr. 2020 kl. 21:24 skrev Eli Zaretskii via Unicode : > >> Date: Fri, 17 Apr 2020 16:09:41 +0100 >> From: Richard Wordingham via Unicode >> >> In some editors, e.g. Emacs 24.4.2, changing the font does >> significantly change the position. (For most Emacs modes, one nowadays >> has to specify the character as U+0009 to enter the tab as a character.) > > Emacs uses a fixed number of space_width pixels to display a TAB. > Since space_width varies with font, so does the width of a TAB. 1) Several text editors, primarily aimed at editing program source code, actually REPLACE an input (i.e. from the keyboard, not when read in from a file) tab character with a number of spaces, or can be set to do so. That is preferred for programming, with the (max) number of spaces specified in a style guide, since people use different editors (with different (default) tab positions), or use different tab positions even if using the same text editor. This is especially true for programming project involving hundreds of people and/or spanning over many years. 2) (Stepping away from source code editing.) What if one can change font or size ?in the middle of lines?. With the emacs approach (but not talking about emacs here, but a context where one actually can change font or size in the middle of lines), one would then get a harder time getting text to line up over several lines by using tabs. That kind of defeats the purpose of tabs. Sure, tables are better, and are not bogged with other (similar) problems associated with tabs and setting tab stops. But I see no reason to make tabs less useful than they have been. So no, the emacs approach to tabs ?does not fly?. 3) Tab stops really used to be settable (either by proprietary escape sequences, or standard ECMA-48 control sequences) on some terminals. Granted, those terminals also had very fixed size of their characters?s glyphs, and at that time one could not set the tab stops ?between? character positions (which one can in some modern ?document editors? (trying to use a neutral term here)). In summary, tab stops are at particular positions in a displayed line of text, and do not depend on font changes, or font size changes. In some contexts one can set the tab stops (not just using default positions), and they ?stay? over font changes and font size changes, until tab stops are (re)set by a tab setting ?command? (or paragraph property, or similar mechanism, depending on system). I would say that emacs stands alone with its rather strange interpretation. (As far as I know.) /Kent Karlsson From asmusf at ix.netcom.com Sun Apr 19 22:47:09 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 19 Apr 2020 20:47:09 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <000601d6169f$78f93f20$6aebbd60$@xencraft.com> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <835zdwjjg7.fsf@gnu.org> <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> <000601d6169f$78f93f20$6aebbd60$@xencraft.com> Message-ID: An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Apr 20 02:54:08 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 20 Apr 2020 08:54:08 +0100 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> Message-ID: <20200420085408.424c4f56@JRWUBU2> On Mon, 20 Apr 2020 02:01:36 +0200 Kent Karlsson via Unicode wrote: > I would say that emacs stands alone with its rather strange > interpretation. (As far as I know.) I believe it's the standard implementation for character cell displays. All I can say from experimentation is that the tab positions are by default at positions defined as many character cells along. As there is a fair amount of code where tabs are functioning as a sequence of spaces, it remains desirable for the default to be preserved. I wouldn't be surprised if the Emacs behaviour were tunable - most of Emacs is! For example, when entering easily parsable tables of data perforce using proportional fonts (and there doesn't seem to be a monospace equivalent for scripts with conjuncts), it would be visually pleasing to have a tab setting option that didn't depend on counting character cells. Richard. From duerst at it.aoyama.ac.jp Mon Apr 20 03:25:50 2020 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Mon, 20 Apr 2020 17:25:50 +0900 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <835zdwjjg7.fsf@gnu.org> <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> <000601d6169f$78f93f20$6aebbd60$@xencraft.com> Message-ID: <6d7cdac5-ae71-3b2e-750f-6553e784f9c2@it.aoyama.ac.jp> On 20/04/2020 12:47, Asmus Freytag via Unicode wrote: > On 4/19/2020 4:08 PM, Tex via Unicode wrote: >> I always thought the BEL sound should Doppler shift when the font character >> width expanded, so that it was deeper and lasted longer. Skinny fonts should >> be more of a chirp. > > Nice one. Yea. And it should probably get louder for bolder fonts. Regards, Martin. From eliz at gnu.org Mon Apr 20 09:18:40 2020 From: eliz at gnu.org (Eli Zaretskii) Date: Mon, 20 Apr 2020 17:18:40 +0300 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> (message from Asmus Freytag via Unicode on Sun, 19 Apr 2020 14:36:16 -0700) References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <835zdwjjg7.fsf@gnu.org> <6156bcbb-8c8e-165d-66c4-d46e1b9e1aff@ix.netcom.com> Message-ID: <831roii6kf.fsf@gnu.org> > Date: Sun, 19 Apr 2020 14:36:16 -0700 > From: Asmus Freytag via Unicode > > No. On a GUI display, Emacs shows a TAB as a stretch of white space > of a suitable width, not as a string of space characters. > > On a GUI display, no whitespace character is a rendered glyph. All are effectively control functions. Maybe conceptually. In practice, on GUI displays Emacs displays a space by using the font's glyph for that character. It doesn't do that for TABs (or other low ASCII control characters, for that matter). > TAB requests that layout proceed from the next tab position beyond the width of the already laid out text. Emacs is primarily a programmer's editor, and it allows the user to define the width of a tab stop, in units of the font's space character width. It then produces a stretch of white space whose width is computed to end at the next tab stop, as customized by the user (via a variable that is local to the buffer of text being displayed). > Just because it's proven a useful workaround to scale default tab positions based on font size and typical > width (using the space character as a proxy) it does not follow that tab-stop positioning is now a per-font run > property. It is not. I just described what Emacs does, because someone described that somewhat inaccurately. HTH. From eliz at gnu.org Mon Apr 20 09:36:51 2020 From: eliz at gnu.org (Eli Zaretskii) Date: Mon, 20 Apr 2020 17:36:51 +0300 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: (message from Kent Karlsson on Mon, 20 Apr 2020 02:01:36 +0200) References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> Message-ID: <83tv1egr5o.fsf@gnu.org> > From: Kent Karlsson > Date: Mon, 20 Apr 2020 02:01:36 +0200 > Cc: Richard Wordingham , > Unicode > > > Emacs uses a fixed number of space_width pixels to display a TAB. > > Since space_width varies with font, so does the width of a TAB. > > 1) Several text editors, primarily aimed at editing program source code, actually REPLACE > an input (i.e. from the keyboard, not when read in from a file) tab character with a number of spaces, > or can be set to do so. That is preferred for programming, with the (max) number of spaces specified > in a style guide, since people use different editors (with different (default) tab positions), or > use different tab positions even if using the same text editor. This is especially true for programming > project involving hundreds of people and/or spanning over many years. What happens in the text and what is shown on display are different and separate things in Emacs. Emacs does have commands to replace TABs with spaces, and vice versa, but that is unrelated to how a TAB is displayed. The replacement commands, btw, make sure the text on display stays aligned the same, i.e. if you defined a TAB stop every 4 characters, a TAB at the beginning of a line will be replaced by 4 spaces, not 8. > 2) (Stepping away from source code editing.) What if one can change font or size ?in the middle of lines?. > With the emacs approach (but not talking about emacs here, but a context where one actually can > change font or size in the middle of lines), one would then get a harder time getting text to line up > over several lines by using tabs. That kind of defeats the purpose of tabs. Sure, tables are better, and > are not bogged with other (similar) problems associated with tabs and setting tab stops. But I see no > reason to make tabs less useful than they have been. So no, the emacs approach to tabs ?does not fly?. Yes. If someone wants the text to align, they will not mix variable fonts in the same text (and will probably use a fixed-pitch font anyway). > 3) Tab stops really used to be settable (either by proprietary escape sequences, or standard ECMA-48 > control sequences) on some terminals. Granted, those terminals also had very fixed size of their > characters?s glyphs, and at that time one could not set the tab stops ?between? character positions > (which one can in some modern ?document editors? (trying to use a neutral term here)). As I explained, the number of space_width units in a tab stop can be controlled for each buffer of text. This affects how text is displayed, not its contents. > In summary, tab stops are at particular positions in a displayed line of text, and do not depend on > font changes, or font size changes. In some contexts one can set the tab stops (not just using > default positions), and they ?stay? over font changes and font size changes, until tab stops are > (re)set by a tab setting ?command? (or paragraph property, or similar mechanism, depending > on system). That would mean, for example, that if you make the font smaller, the tab stops stay in the same positions, pixel-wise? isn't that strange? > I would say that emacs stands alone with its rather strange interpretation. (As far as I know.) Emacs is unique in many aspects; I guess this is one of them. From sosipiuk at gmail.com Mon Apr 20 11:38:59 2020 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Mon, 20 Apr 2020 12:38:59 -0400 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <83tv1egr5o.fsf@gnu.org> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <83tv1egr5o.fsf@gnu.org> Message-ID: On Mon, Apr 20, 2020 at 10:39 AM Eli Zaretskii via Unicode wrote: > > That would mean, for example, that if you make the font smaller, the > tab stops stay in the same positions, pixel-wise? isn't that strange? I don't think it's strange to define a tab position as a fraction of a line, rather than a number of (font-dependent) space widths. Both approaches have advantages and disadvantages. From asmusf at ix.netcom.com Mon Apr 20 15:48:09 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 20 Apr 2020 13:48:09 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <83tv1egr5o.fsf@gnu.org> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <83tv1egr5o.fsf@gnu.org> Message-ID: <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Tue Apr 21 02:49:34 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 21 Apr 2020 08:49:34 +0100 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <83tv1egr5o.fsf@gnu.org> <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> Message-ID: <20200421084934.60f227f8@JRWUBU2> On Mon, 20 Apr 2020 13:48:09 -0700 Asmus Freytag via Unicode wrote: > But generally, in rich text environment, these are properties of > blocks of text (paragraphs) and don't track with font size. Surely the primary target of a text editor is plain text. This is why one would expect differences between a word processor and a program editor. Of course, with a proportional width font, it is rather difficult to interpret a tab position of '8 characters'. Interpreting a unit of 'character' works if tab characters are only used for indentation. It doesn't work so well if it is used to separate data in a table. Fixed width fonts for multilingual data tables are not necessarily available, though Evertype provides a wide coverage for a sane but not universally adopted definition of 'fixed width'. Richard. From samjnaa at gmail.com Tue Apr 21 11:03:22 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Tue, 21 Apr 2020 21:33:22 +0530 Subject: Basic Unicode character/string support absent even in modern C++ Message-ID: char16_t and char32_t along with the corresponding string types u16string and u32string were added in C++11: https://en.cppreference.com/w/cpp/language/types https://en.cppreference.com/w/cpp/string But till date one can't write any of them to cout. A simple cout << u'?' or cout << u"????" doesn't work and throws umpteen lines of obscure compiler errors. Some relevant threads: https://stackoverflow.com/q/6020314/1503120 https://stackoverflow.com/q/5611759/1503120 I really don't understand the point of having character and string types if you can't print them! I don't accept the rationale (which seems to be mentioned in the top answer to that first question) that there isn't so much demand for writing to such an encoding. First of all, the encoding exists precisely because it's useful. Second, this is about writing *from* that encoding to plain cout which one assumes connects to a UTF-8 console. Or if that assumption isn't acceptable, then resolve it! Let there be a proper encoding setting for cout. It would seem that C++'s std::cout isn't really a "character" output (or is it console output) unlike Qt's QTextStream or Python's sys.stdout. Those seem to handle Unicode just fine. If there's someone here with the wherewithal to get this C++ situation fixed, my humble request to you to do so! -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at honermann.net Tue Apr 21 16:21:10 2020 From: tom at honermann.net (Tom Honermann) Date: Tue, 21 Apr 2020 17:21:10 -0400 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: Message-ID: <6951d88d-ae22-b583-8088-dacc2b318c4f@honermann.net> Hi Shriramana. The WG21 (C++ standard working group) SG16 (Unicode and text processing) study group is pursuing solutions for this issue.? I encourage you to reach out to them with your request and ideas.? See https://github.com/sg16-unicode/sg16. Unfortunately, it is a difficult issue to address in a portable way for a variety of reasons.? A few of these are discussed with regard to char8_t at https://stackoverflow.com/questions/58878651/what-is-the-printf-formatting-character-for-char8-t/58895428#58895428. Tom. On 4/21/20 12:03 PM, Shriramana Sharma via Unicode wrote: > char16_t and char32_t along with the corresponding string types > u16string and u32string were added in C++11: > > https://en.cppreference.com/w/cpp/language/types > https://en.cppreference.com/w/cpp/string > > But till date one can't write any of them to cout. A simple cout << > u'?' or cout << u"????" doesn't work and throws umpteen lines of > obscure compiler errors. > > Some relevant threads: > https://stackoverflow.com/q/6020314/1503120 > https://stackoverflow.com/q/5611759/1503120 > > I really don't understand the point of having character and string > types if you can't print them! > > I don't accept the rationale (which seems to be mentioned in the top > answer to that first question) that there isn't so much demand for > writing to such an encoding. > > First of all, the encoding exists precisely because it's useful. > Second, this is about writing *from* that encoding to plain cout which > one assumes connects to a UTF-8 console. Or if that assumption isn't > acceptable, then resolve it! Let there be a proper encoding setting > for cout. > > It would seem that C++'s std::cout isn't really a "character" output > (or is it console output) unlike Qt's QTextStream or Python's > sys.stdout. Those seem to handle Unicode just fine. > > If there's someone here with the wherewithal to get this C++ situation > fixed, my humble request to you to do so! From lyratelle at gmx.de Wed Apr 22 01:05:33 2020 From: lyratelle at gmx.de (Dominikus Dittes Scherkl) Date: Wed, 22 Apr 2020 08:05:33 +0200 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: Message-ID: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> Am 21.04.20 um 18:03 schrieb Shriramana Sharma via Unicode: > char16_t and char32_t along with the corresponding string types > u16string and u32string were added in C++11: > > https://en.cppreference.com/w/cpp/language/types > https://en.cppreference.com/w/cpp/string > > But till date one can't write any of them to cout. A simple cout << u'?' > or cout << u"????" doesn't work and throws umpteen lines of obscure > compiler errors. Use a better programming language, like D, with direct unicode support: https://dlang.org -- Dominikus Dittes Scherkl From pkar at ieee.org Wed Apr 22 02:27:08 2020 From: pkar at ieee.org (Piotr Karocki) Date: Wed, 22 Apr 2020 09:27:08 +0200 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> Message-ID: <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com> >Am 21.04.20 um 18:03 schrieb Shriramana Sharma via Unicode: >> char16_t and char32_t along with the corresponding string types >> u16string and u32string were added in C++11: >> >> https://en.cppreference.com/w/cpp/language/types >> https://en.cppreference.com/w/cpp/string >> >> But till date one can't write any of them to cout. A simple cout << u'?' >> or cout << u"????" doesn't work and throws umpteen lines of obscure >> compiler errors. >Use a better programming language, like D, with direct unicode support: > >https://dlang.org Or better C++ compiler, which understand Unicode source files. From samjnaa at gmail.com Wed Apr 22 05:47:09 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 22 Apr 2020 16:17:09 +0530 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: <6951d88d-ae22-b583-8088-dacc2b318c4f@honermann.net> References: <6951d88d-ae22-b583-8088-dacc2b318c4f@honermann.net> Message-ID: Thanks for your reply. Glad to hear of this effort. Will look into it. For now have downloaded: https://github.com/nemtrif/utfcpp/ And added the following code to my program which now works: #include std::ostream & operator<<(std::ostream & os, std::u16string us) { return os << ::utf8::utf16to8(us); } On Wed, 22 Apr, 2020, 02:51 Tom Honermann, wrote: > Hi Shriramana. > > The WG21 (C++ standard working group) SG16 (Unicode and text processing) > study group is pursuing solutions for this issue. I encourage you to > reach out to them with your request and ideas. See > https://github.com/sg16-unicode/sg16. > > Unfortunately, it is a difficult issue to address in a portable way for > a variety of reasons. A few of these are discussed with regard to > char8_t at > > https://stackoverflow.com/questions/58878651/what-is-the-printf-formatting-character-for-char8-t/58895428#58895428 > . > > Tom. > > On 4/21/20 12:03 PM, Shriramana Sharma via Unicode wrote: > > char16_t and char32_t along with the corresponding string types > > u16string and u32string were added in C++11: > > > > https://en.cppreference.com/w/cpp/language/types > > https://en.cppreference.com/w/cpp/string > > > > But till date one can't write any of them to cout. A simple cout << > > u'?' or cout << u"????" doesn't work and throws umpteen lines of > > obscure compiler errors. > > > > Some relevant threads: > > https://stackoverflow.com/q/6020314/1503120 > > https://stackoverflow.com/q/5611759/1503120 > > > > I really don't understand the point of having character and string > > types if you can't print them! > > > > I don't accept the rationale (which seems to be mentioned in the top > > answer to that first question) that there isn't so much demand for > > writing to such an encoding. > > > > First of all, the encoding exists precisely because it's useful. > > Second, this is about writing *from* that encoding to plain cout which > > one assumes connects to a UTF-8 console. Or if that assumption isn't > > acceptable, then resolve it! Let there be a proper encoding setting > > for cout. > > > > It would seem that C++'s std::cout isn't really a "character" output > > (or is it console output) unlike Qt's QTextStream or Python's > > sys.stdout. Those seem to handle Unicode just fine. > > > > If there's someone here with the wherewithal to get this C++ situation > > fixed, my humble request to you to do so! > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Wed Apr 22 05:49:56 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 22 Apr 2020 16:19:56 +0530 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> Message-ID: On Wed, 22 Apr, 2020, 11:41 Dominikus Dittes Scherkl via Unicode, < unicode at unicode.org> wrote: > > Use a better programming language, like D, with direct unicode support: > LoL yes I've been "d"abbling in D now and then and it's much cleaner than C++ but since I'm looking to use Qt for my frontend, it's C++ for me now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Wed Apr 22 05:51:46 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 22 Apr 2020 16:21:46 +0530 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com> References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com> Message-ID: On Wed, 22 Apr, 2020, 13:05 Piotr Karocki via Unicode, wrote: > >Am 21.04.20 um 18:03 schrieb Shriramana Sharma via Unicode: > >> char16_t and char32_t along with the corresponding string types > >> u16string and u32string were added in C++11: > >> > >> https://en.cppreference.com/w/cpp/language/types > >> https://en.cppreference.com/w/cpp/string > >> > >> But till date one can't write any of them to cout. A simple cout << u'?' > >> or cout << u"????" doesn't work and throws umpteen lines of obscure > >> compiler errors. > >Use a better programming language, like D, with direct unicode support: > > > >https://dlang.org > > Or better C++ compiler, which understand Unicode source files. > My latest GCC and Clang don't have any problem with the source files. The limitation is with the standard libraries which don't provide the required functionality. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkar at ieee.org Wed Apr 22 06:33:43 2020 From: pkar at ieee.org (Piotr Karocki) Date: Wed, 22 Apr 2020 13:33:43 +0200 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com> Message-ID: >> Or better C++ compiler, which understand Unicode source files. > My latest GCC and Clang don't have any problem with the source files. > The limitation is with the standard libraries which don't provide the > required functionality. But you wrote that you got messages from compiler not from runtime. And error from compiler is irrelevant to any error in any libraries, standard or not, as code is not executed yet. From samjnaa at gmail.com Wed Apr 22 10:54:27 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 22 Apr 2020 21:24:27 +0530 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com> Message-ID: On Wed, Apr 22, 2020 at 5:09 PM Piotr Karocki via Unicode wrote: > > >> Or better C++ compiler, which understand Unicode source files. > > My latest GCC and Clang don't have any problem with the source files. > > The limitation is with the standard libraries which don't provide the > > required functionality. > > But you wrote that you got messages from compiler not from runtime. And > error from compiler is irrelevant to any error in any libraries, standard or > not, as code is not executed yet. ??? The error is given by the compiler because the stdlib doesn't provide the necessary functionality, which is was I was lamenting. For a simple program: #include #include int main() { std::cout << "abcd\n"; } This works fine, but: int main() { std::cout << u"abcd\n"; } just prints out a hex value which is probably the pointer, and changing that to: int main() { std::cout << std::u16string(u"abcd"); } writes out *94* lines ending with an innocuous: ?1 error generated.? all complaining about: note: candidate function not viable: no known conversion from 'std::u16string' (aka 'basic_string') to 'XXX' for 2nd argument And I realize I don't need to use 16-bit for Latin chars, but of course I'm using Indic chars in my actual program. Anyhow, as I posted earlier, converting it to UTF-8 just works fine, but it would be good if there's some mechanism that one doesn't have to do that manually, making the learning curve for new learners easier. -- Shriramana Sharma ???????????? ???????????? ???????????? From richard.wordingham at ntlworld.com Wed Apr 22 11:52:18 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 22 Apr 2020 17:52:18 +0100 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com>

Message-ID: <20200422175218.191e9c25@JRWUBU2> On Wed, 22 Apr 2020 21:24:27 +0530 Shriramana Sharma via Unicode wrote: > And I realize I don't need to use 16-bit for Latin chars, but of > course I'm using Indic chars in my actual program. Nitpick: If you're using Unicode, you need them for English "caf?" as opposed to "cafe". Richard. From d3ck0r at gmail.com Wed Apr 22 11:54:13 2020 From: d3ck0r at gmail.com (J Decker) Date: Wed, 22 Apr 2020 09:54:13 -0700 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com>

Message-ID: On Wed, Apr 22, 2020 at 9:01 AM Shriramana Sharma via Unicode < unicode at unicode.org> wrote: > On Wed, Apr 22, 2020 at 5:09 PM Piotr Karocki via Unicode > wrote: > > > > >> Or better C++ compiler, which understand Unicode source files. > > > My latest GCC and Clang don't have any problem with the source files. > > > The limitation is with the standard libraries which don't provide the > > > required functionality. > I was rather astonished to learn that C is really just ASCII. How very limiting. Although; C/C++ surely have libraries that deal with such things? I have one for C, so I know that it's at least possible. Text is kept internally as utf8 code unit arrays with a known length, so I can include '\0' in a string. I would LOVE if C could magically substitute the string terminator with the byte 0xFF instead of 0x00. I noticed that utf8 valid encodings must always have at least 1 bit off. > > note: candidate function not viable: no known conversion from > 'std::u16string' (aka 'basic_string') to 'XXX' for 2nd > argument > > And I realize I don't need to use 16-bit for Latin chars, but of > course I'm using Indic chars in my actual program. > > Anyhow, as I posted earlier, converting it to UTF-8 just works fine, > but it would be good if there's some mechanism that one doesn't have > to do that manually, making the learning curve for new learners > easier. > > and I just fwrite( logstring, length, 1, stdout ); and get wide character and unicode support if the terminal supports it, or if the editor reading the redirected output supports it... (where logstring is some thing I printed into with like vsnprintf() )... ?? and I copy and paste things a lot... I don't have a good entry method for codes. This is my favorite thing to keep around to test console.log( "Hello World; This is a test file." + "??????????" ); Because all of those are even surrogate pairs (0x10000+) . But that's some JS...but, then again, my C libraries are quite happy to take the utf8 strings from JS and regenerate them, just as a string of bytes. I have a rude text-x0r masking routine that generates valid codepoints, but can result in 0; normally you can even still use like strlen etc to deal with strings; so I don't see why C++ strings would have so much mroe difficulty (other than not supporting them in text files; but,then again, that's what preprocessors are for I suppose. > > -- > Shriramana Sharma ???????????? ???????????? ???????????? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleksey.tulinov at gmail.com Wed Apr 22 12:52:51 2020 From: aleksey.tulinov at gmail.com (Aleksey Tulinov) Date: Wed, 22 Apr 2020 20:52:51 +0300 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com>

Message-ID: C is agnostic, there is no much difference between "0123456789" and "??????????", it's just a bitstream (or bytestream). #include int main() { printf("Hello World; This is a test file. " "??????????" "\n"); return 0; } $ gcc -Wall -Wextra -pedantic test.c $ ./a.out Hello World; This is a test file. ?????????? Of course if you want to manipulate strings like this, or count number of characters in string, then there should be some concept of encoding present, then concept of character. Then some concept of locale too, because "Dz" isn't always two characters, or rather in one language it's two characters, in another language it's just one character etc. ??, 22 ???. 2020 ?. ? 20:02, J Decker via Unicode : > > > On Wed, Apr 22, 2020 at 9:01 AM Shriramana Sharma via Unicode < > unicode at unicode.org> wrote: > >> On Wed, Apr 22, 2020 at 5:09 PM Piotr Karocki via Unicode >> wrote: >> > >> > >> Or better C++ compiler, which understand Unicode source files. >> > > My latest GCC and Clang don't have any problem with the source files. >> > > The limitation is with the standard libraries which don't provide the >> > > required functionality. >> > > I was rather astonished to learn that C is really just ASCII. How very > limiting. > Although; C/C++ surely have libraries that deal with such things? I have > one for C, so I know that it's at least possible. Text is kept internally > as utf8 code unit arrays with a known length, so I can include '\0' in a > string. > > I would LOVE if C could magically substitute the string terminator with > the byte 0xFF instead of 0x00. I noticed that utf8 valid encodings must > always have at least 1 bit off. > > >> >> note: candidate function not viable: no known conversion from >> 'std::u16string' (aka 'basic_string') to 'XXX' for 2nd >> argument >> >> And I realize I don't need to use 16-bit for Latin chars, but of >> course I'm using Indic chars in my actual program. >> >> Anyhow, as I posted earlier, converting it to UTF-8 just works fine, >> but it would be good if there's some mechanism that one doesn't have >> to do that manually, making the learning curve for new learners >> easier. >> >> > and I just fwrite( logstring, length, 1, stdout ); and get wide character > and unicode support if the terminal supports it, or if the editor reading > the redirected output supports it... (where logstring is some thing I > printed into with like vsnprintf() )... > > ?? and I copy and paste things a lot... I don't have a good entry method > for codes. > > This is my favorite thing to keep around to test console.log( "Hello > World; This is a test file." + "??????????" ); Because all of > those are even surrogate pairs (0x10000+) . > > But that's some JS...but, then again, my C libraries are quite happy to > take the utf8 strings from JS and regenerate them, just as a string of > bytes. > I have a rude text-x0r masking routine that generates valid codepoints, > but can result in 0; normally you can even still use like strlen etc to > deal with strings; so I don't see why C++ strings would have so much mroe > difficulty (other than not supporting them in text files; but,then again, > that's what preprocessors are for I suppose. > > >> >> -- >> Shriramana Sharma ???????????? ???????????? ???????????? >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.b.karlsson at bahnhof.se Wed Apr 22 19:13:04 2020 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Thu, 23 Apr 2020 02:13:04 +0200 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <83tv1egr5o.fsf@gnu.org> <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> Message-ID: <123C32C8-A478-48A9-BBC3-730543496417@bahnhof.se> > 20 apr. 2020 kl. 22:48 skrev Asmus Freytag via Unicode : > > On 4/20/2020 7:36 AM, Eli Zaretskii via Unicode wrote: >>> In summary, tab stops are at particular positions in a displayed line of text, and do not depend on >>> font changes, or font size changes. In some contexts one can set the tab stops (not just using >>> default positions), and they ?stay? over font changes and font size changes, until tab stops are >>> (re)set by a tab setting ?command? (or paragraph property, or similar mechanism, depending >>> on system). >> That would mean, for example, that if you make the font smaller, the >> tab stops stay in the same positions, pixel-wise? isn't that strange? >> > No, that's how every word processor and text layout system has worked from day one (except apparently programming editors). > > On typewriters, if your machine had adjustable tabs, you could change your mind about tab stop positions at any time while in the middle of a line. > For a manual (non-electric) typewriter I once had, that was true. (I did not keep it, which I now regret?) For the Diablo typewriter terminal (Diablo 1620), yes I have used one, it had no tab stops set at power-up (it then moved the print position to the ?far right? when outputting a tab character). One had to explicitly set tab stops (with proprietary escape sequences) to get any tab stops. So I would agree that having some default tab stops is handy... > But generally, in rich text environment, these are properties of blocks of text (paragraphs) and don't track with font size. > Indeed (in principle, not so sure tab stops need be bundled with paragraph properties). But one can say like this: changing the *default* font (typeface and/or em size), as a preference setting that apply ?globally? (i.e. not just to a text ?run? (substring), for some notion of ?globally"), if the application allows for such a preference setting: that may imply a change of default tab stops (and some applications might have only the default tab stops, not any explicitly set ones). So in this case one can eat the cake and have it too? Changing the default font *preference* (implicitly changing default tab stops ?globally?, but not changing any explicitly set tab stops) is different from changing the font for a text *run* (which would never change any tab stops, whether default or explicit). (For emacs, there would only be the (currently set) default font, and (the currently set) default tab stops. A ?word processor? may or may not allow user setting of the default font as a preference.) /Kent K > A./ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Apr 22 21:30:05 2020 From: asmusf at ix.netcom.com (Asmus Freytag (c)) Date: Wed, 22 Apr 2020 19:30:05 -0700 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: <123C32C8-A478-48A9-BBC3-730543496417@bahnhof.se> References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <83tv1egr5o.fsf@gnu.org> <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> <123C32C8-A478-48A9-BBC3-730543496417@bahnhof.se> Message-ID: On 4/22/2020 5:13 PM, Kent Karlsson wrote: > > >> 20 apr. 2020 kl. 22:48 skrev Asmus Freytag via Unicode >> >: >> >> On 4/20/2020 7:36 AM, Eli Zaretskii via Unicode wrote: >>>> In summary, tab stops are at particular positions in a displayed line of text, and do not depend on >>>> font changes, or font size changes. In some contexts one can set the tab stops (not just using >>>> default positions), and they ?stay? over font changes and font size changes, until tab stops are >>>> (re)set by a tab setting ?command? (or paragraph property, or similar mechanism, depending >>>> on system). >>> That would mean, for example, that if you make the font smaller, the >>> tab stops stay in the same positions, pixel-wise? isn't that strange? >>> >> No, that's how every word processor and text layout system has worked >> from day one (except apparently programming editors). >> >> On typewriters, if your machine had adjustable tabs, you could change >> your mind about tab stop positions at any time while in the middle of >> a line. >> > For a manual (non-electric) typewriter I once had, that was true. (I > did not keep it, which I now regret?) > > For the Diablo typewriter terminal (Diablo 1620), yes I have used one, > it had *no* tab stops set at power-up (it then moved the print > position to the ?far right? when outputting a tab character). One > /had/ to *explicitly* set tab stops (with proprietary escape > sequences) to get any tab stops. So I would agree that having?some > default tab stops is handy... >> >> But generally, in rich text environment, these are properties of >> blocks of text (paragraphs) and don't track with font size. >> > Indeed (in principle, not so sure tab stops need be bundled with > paragraph properties). But one can say like this: > > * changing the *default* font (typeface and/or em size), as a > /preference/ setting that apply ?globally? (i.e. not just to a > text ?run? (substring), for some notion of ?globally"), if the > application allows for such a preference setting: > o that may imply a change of /default/ tab stops (and some > applications might have only the default tab stops, not any > explicitly set ones). > > > So in this case one can eat the cake and have it too? Changing the > *default font *preference**?(implicitly changing default tab > stops??globally?, but not changing any explicitly set tab stops) is > different from changing the *font for a text *run** (which would never > change any tab stops, whether default or explicit). (For emacs, there > would only be the (currently set) default font, and (the currently > set) default tab stops. A??word processor? may or may not allow user > setting of the /default/ font as a preference.) > Such global properties work well for uni-font displays like programming editors, notepad etc. where there's a corresponding single, global font setting. At that point, letting the font choice influence the "step-width" for the tab stop isn't too far fetched. But it would be a very specific use case and one that doesn't generalize well to editing rich text documents. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliz at gnu.org Thu Apr 23 09:37:08 2020 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 23 Apr 2020 17:37:08 +0300 Subject: Why is tab unaffected by font whereas space is affected? In-Reply-To: (unicode@unicode.org) References: <20200417160941.11e29ce9@JRWUBU2> <83a738k366.fsf@gnu.org> <83tv1egr5o.fsf@gnu.org> <2d634301-b25d-c62f-4fd4-11610e1062e9@ix.netcom.com> <123C32C8-A478-48A9-BBC3-730543496417@bahnhof.se> Message-ID: <83y2qmclpn.fsf@gnu.org> > Cc: unicode at unicode.org > Date: Wed, 22 Apr 2020 19:30:05 -0700 > From: "Asmus Freytag $c$ via Unicode" > > Such global properties work well for uni-font displays like programming editors, notepad etc. where there's a > corresponding single, global font setting. At that point, letting the font choice influence the "step-width" for the > tab stop isn't too far fetched. But it would be a very specific use case and one that doesn't generalize well to > editing rich text documents. For the record: Emacs is not a uni-font editor. You can have many different fonts (and more generally, different typefaces) in the same document. From tom at honermann.net Thu Apr 23 14:43:22 2020 From: tom at honermann.net (Tom Honermann) Date: Thu, 23 Apr 2020 15:43:22 -0400 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com>

Message-ID: <3f4178af-3a23-cbe8-33ab-5f392bb0e9d7@honermann.net> On 4/22/20 11:54 AM, Shriramana Sharma via Unicode wrote: > On Wed, Apr 22, 2020 at 5:09 PM Piotr Karocki via Unicode > wrote: >>>> Or better C++ compiler, which understand Unicode source files. >>> My latest GCC and Clang don't have any problem with the source files. >>> The limitation is with the standard libraries which don't provide the >>> required functionality. >> But you wrote that you got messages from compiler not from runtime. And >> error from compiler is irrelevant to any error in any libraries, standard or >> not, as code is not executed yet. > ??? The error is given by the compiler because the stdlib doesn't > provide the necessary functionality, which is was I was lamenting. For > a simple program: > > #include > #include > int main() { std::cout << "abcd\n"; } > > This works fine, but: > > int main() { std::cout << u"abcd\n"; } > > just prints out a hex value which is probably the pointer, and changing that to: C++20 fixed this surprising and undesirable behavior when P1423 was adopted (see the proposal section of that paper and option 7).? The above code is now ill-formed in C++20. > > int main() { std::cout << std::u16string(u"abcd"); } > > writes out *94* lines ending with an innocuous: > > ?1 error generated.? > > all complaining about: > > note: candidate function not viable: no known conversion from > 'std::u16string' (aka 'basic_string') to 'XXX' for 2nd > argument > > And I realize I don't need to use 16-bit for Latin chars, but of > course I'm using Indic chars in my actual program. > > Anyhow, as I posted earlier, converting it to UTF-8 just works fine, > but it would be good if there's some mechanism that one doesn't have > to do that manually, making the learning curve for new learners > easier. From a portability standpoint, converting it to UTF-8 doesn't actually work fine.? Doing so might produce the behavior you want on platforms you care about, but it won't on all of them.? In particular, it won't work reliably on Windows (where the locale dependent encoding [*] is never UTF-8 and where the console is unable to display characters outside the BMP) or on z/OS (where the locale dependent encoding is EBCDIC based). Tom. [*]: Ok, very recent Windows releases have ways of making the locale use UTF-8, but it is still an experimental feature. > > > > -- > Shriramana Sharma ???????????? ???????????? ???????????? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at honermann.net Thu Apr 23 14:54:49 2020 From: tom at honermann.net (Tom Honermann) Date: Thu, 23 Apr 2020 15:54:49 -0400 Subject: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com>

Message-ID: On 4/22/20 12:54 PM, J Decker via Unicode wrote: > > > On Wed, Apr 22, 2020 at 9:01 AM Shriramana Sharma via Unicode > > wrote: > > On Wed, Apr 22, 2020 at 5:09 PM Piotr Karocki via Unicode > > wrote: > > > > >> Or better C++ compiler, which understand Unicode source files. > > > My latest GCC and Clang don't have any problem with the source > files. > > > The limitation is with the standard libraries which don't > provide the > > > required functionality. > > > I was rather astonished to learn that C is really just ASCII.? How > very limiting. C (and C++) are not ASCII.? The standards specify abstract basic source and execution character repertoires; implementations define what character sets are used.? In practice, these character sets are ASCII or EBCDIC based.? The standards also provide facilities to specify any Unicode character via an escape sequence. Implementations map extended characters onto these escape sequences in order to support source files encoded with characters outside the basic character repertoires.? (e.g., \u00e1 is a valid identifier and means the same thing as ? written in an implementation supported source file encoding). > Although; C/C++ surely have libraries that deal with such things?? I > have one for C, so I know that it's at least possible.? Text is kept > internally as utf8 code unit arrays with a known length, so I can > include '\0' in a string. Yes, there are libraries.? ICU is the most well-known. Tom. > > I would LOVE if C could magically substitute the string terminator > with the byte 0xFF instead of 0x00.? I noticed that utf8 valid > encodings must always have at least 1 bit off. > > > note: candidate function not viable: no known conversion from > 'std::u16string' (aka 'basic_string') to 'XXX' for 2nd > argument > > And I realize I don't need to use 16-bit for Latin chars, but of > course I'm using Indic chars in my actual program. > > Anyhow, as I posted earlier, converting it to UTF-8 just works fine, > but it would be good if there's some mechanism that one doesn't have > to do that manually, making the learning curve for new learners > easier. > > > and I just fwrite( logstring, length, 1, stdout ); and get wide > character and unicode support if the terminal supports it, or if the > editor reading the redirected output supports it... (where logstring > is some thing I printed into with like vsnprintf() )... > > ?? and I copy and paste things a lot... I don't have a good entry > method for codes. > > This is my favorite thing to keep around to test ?console.log(?"Hello > World; This is a test file."+"??????????"); Because all of > those are even surrogate pairs (0x10000+) . > But that's some JS...but, then again, my C libraries are quite happy > to take the utf8 strings from JS and regenerate them, just as a string > of bytes. > I have a rude text-x0r masking routine that generates valid > codepoints, but can result in 0; normally you can even still use like > strlen etc to deal with strings; so I don't see why C++ strings would > have so much mroe difficulty (other than not supporting them in text > files; but,then again, that's what preprocessors are for I suppose. > > > > -- > Shriramana Sharma ???????????? ???????????? ???????????? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at honermann.net Thu Apr 23 15:44:08 2020 From: tom at honermann.net (Tom Honermann) Date: Thu, 23 Apr 2020 16:44:08 -0400 Subject: [EXTERNAL] Re: Basic Unicode character/string support absent even in modern C++ In-Reply-To: References: <73f4233c-59e8-2631-af3b-d7c59571abd0@gmx.de> <2f1cb58d199c350d061e12871d1578a7@mail.gmail.com>

Message-ID: <8c681bfd-13c0-f78e-f144-22c4695d2b53@honermann.net> On 4/23/20 4:24 PM, Murray Sargent wrote: > > I write a lot of C++ code to process math text and use the Visual > Studio C++ and clang compilers to compile for Win32, Mac, iOS, and > Android platforms. These compilers support UTF-8 strings, literals and > comments. They even allow math italic characters like ? (U+1D44E) to > be used as variables! Using real math symbols instead of notation like > 0x222B makes the code so much more readable. It?s easy to have > Devanagari string constants, braille constants (see Unicode 2800 > block), etc. > There is a proposal in action to place new restrictions on what characters can be used in identifiers following the guidelines in UAX #31 .? Follow the progress of P1949 if you are interested (monitoring the SG16 mailing list and meeting notes (see https://github.com/sg16-unicode/sg16) is one way to do so).? It is *possible* that some of the identifiers you've been using may become invalid if that proposal is adopted. > > C++ has grown up considerably past ASCII. But you still have to use > ASCII operators such as != instead of ?. It would be so fine if C++ > would allow standard math operators to be used as aliases for the > ASCII operator pairs like !=, <=, >= > I'm not aware of any serious proposals to do so.? But we are being careful in our evaluation of P1949 to ensure that doing so will remain a possibility in the future. Note that P1949 will *not* allow doing something like the following.? This is to ensure the availability of ? for use as a future operator. #define ? != int f(int a, int b) { ? return a ? b; } Tom. > Murray > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rick at unicode.org Thu Apr 23 18:20:58 2020 From: rick at unicode.org (Rick McGowan) Date: Thu, 23 Apr 2020 16:20:58 -0700 Subject: Unicode Locale Data v37 released! Message-ID: <5EA222DA.1050001@unicode.org> The final version of Unicode CLDR version 37 is now available. It focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13. Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. *Expanded locale preferences for units of measurement*. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and accurately convert input measurement into those units. *Emoji 13.0*. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji. *Annotations (names and keywords) expanded to cover more than emoji*. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html . *New locales. *New languages at *Basic* coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese. New languages at *Modern* coverage: Nigerian Pidgin. See Locale Coverage Data for the coverage per locale, for both new and old locales. *Grammatical features added. *Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers"). *Updates to code sets.* In particular, the EU is updated (removing GB). For more details, access to the data and charts, and important notes for smoothly migrating implementations, see Unicode CLDR Version 37 . -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cldr-37-annc-image.png Type: image/png Size: 177288 bytes Desc: not available URL: From doug at ewellic.org Thu Apr 23 20:29:04 2020 From: doug at ewellic.org (Doug Ewell) Date: Thu, 23 Apr 2020 19:29:04 -0600 Subject: Mail list archive Message-ID: <000301d619d7$bdcbc740$396355c0$@ewellic.org> Does anyone have an educated guess when the public mail list archive on the Unicode web site will be brought up to date? https://www.unicode.org/mail-arch/unicode-ml/y2020-m04/index.html was last updated on April 2, three weeks ago, and there have been at least 38 posts to this list since then. I suppose this is part of the broader question of when unicode.org will be fully restored and reliable. It's still down perhaps 50% of the time that I try to access it. Perhaps the daemon that creates the web archive was disabled as part of the Great Crash. -- Doug Ewell | Thornton, CO, US | ewellic.org From rick at unicode.org Fri Apr 24 16:57:04 2020 From: rick at unicode.org (Rick McGowan) Date: Fri, 24 Apr 2020 14:57:04 -0700 Subject: ICU 67 Released Message-ID: <5EA360B0.2090700@unicode.org> ICU LogoUnicode? ICU 67 has just been released. ICU 67 updates to CLDR 37 locale data with many additions and corrections. This release also includes the updates to Unicode 13 , subsuming the special CLDR 36.1 and ICU 66 releases . ICU 67 includes many bug fixes for date and number formatting, including enhanced support for user preferences in the locale identifier. The LocaleMatcher code and data are improved, and number skeletons have a new ?concise? form that can be used in MessageFormat strings. ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR). For details, please see http://site.icu-project.org/download/67. http://blog.unicode.org/2020/04/icu-67-released.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ICU-logo.png Type: image/png Size: 11645 bytes Desc: not available URL: