From unicode at unicode.org Tue Sep 5 04:52:49 2017 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Tue, 5 Sep 2017 18:52:49 +0900 Subject: Assamese and Unicode. In-Reply-To: <559555838.976649.1503527718806@mail.yahoo.com> References: <559555838.976649.1503527718806.ref@mail.yahoo.com> <559555838.976649.1503527718806@mail.yahoo.com> Message-ID: <16c08956-dcae-8747-a09a-c99a795c0676@it.aoyama.ac.jp> Sorry for the long delay of this answer. On 2017/08/24 07:35, David Faulks via Unicode wrote: > It appears that the Indian government will submit an 'Assamese' proposal. > > http://silchar.com/unicode-standard-for-assamese-in-the-offing/ > > Since everything I know about Assamese Script indicates that it is basically the same as Bengali and the Unicode Assamese controversy is derived entirely from a sub-nationalistic fit over character and script names, I expect that this proposal will not be accepted. The best thing to do is to have lot's of content in Assamese in Unicode. This will show that things just work. This reminds me of the 1990ies, where many "experts" in Japan were complaining that Han Unification would destroy Japanese culture, but where writing this using software that used Unicode inside, thus providing a proof to the contrary. So the best thing to happen is to have this discussion in Assamese rather than in English, because then people eventually will see that there's no problem. Regards, Martin. > However, 'popular nationalism' will probably be used to attack Unicode then. > > David Faulks From unicode at unicode.org Tue Sep 5 07:40:43 2017 From: unicode at unicode.org (Shriramana Sharma via Unicode) Date: Tue, 5 Sep 2017 18:10:43 +0530 Subject: Assamese and Unicode. In-Reply-To: <16c08956-dcae-8747-a09a-c99a795c0676@it.aoyama.ac.jp> References: <559555838.976649.1503527718806.ref@mail.yahoo.com> <559555838.976649.1503527718806@mail.yahoo.com> <16c08956-dcae-8747-a09a-c99a795c0676@it.aoyama.ac.jp> Message-ID: On 9/5/17, Martin J. D?rst via Unicode wrote: > The best thing to do is to have lot's of content in Assamese in Unicode. > This will show that things just work. IIUC the problem is with Assamese not accepting the label "Bengali" to "their" script. AFAICS they do not deny that the encoding "just works". -- Shriramana Sharma ???????????? ???????????? From unicode at unicode.org Sun Sep 17 14:07:55 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sun, 17 Sep 2017 20:07:55 +0100 Subject: Western numeral diacritics in complex scripts Message-ID: <20170917200755.7f5a2895@JRWUBU2> In philological work, one encounters the problem that two or more abstract characters have only same 'natural' transliteration; the same problem can apply to reconstructed phonemes, where there is no sound indication of the actual pronunciation. A common solution is to use a subscript or superscript numeral to distinguish them. The habit of using Western Arabic numbers ('0', '1' etc.) for this purpose has spread to such an extent that they occur in Sanskrit in the Tamil script, and this is recognised by SUBSCRIPT/SUPERSCRIPT TWO..FOUR having the Indic syllabic category of syllable modifier. Where may the superscript digits go in encoding Sanskrit in the Tamil script? I have recently seen the digits placed after the more conventional askhara (counting visible pulli as terminating an askshara), but I though I had also seen them rendered between a consonant and U+0B86 TAMIL LETTER AA. This scheme I had internalised as placement to the right of the glyph containing the consonant. Now, I have just come across a similar device in a Lao description of the Tai Tham script. I came across the attached table on p33 of "Venez Apprendre des Carart?res Dhammiques en ?criture de P?l? et de Sanskrit" by Chanthanom Deuanhaksa (http://lao-online.com/all_files/books/B01833.pdf). Rather than use ? U+1A46 TAI THAM LETTER HIGH SHA and ? U+1A47 TAI THAM LETTER HIGH SSA, which are in the font being used, the author has chosen to use the sequences ?? and ?? . The 'only' problem with representing the text in the Lao column (the 4th column) is that it uses letters that were officially removed from the alphabet a couple of generations ago, and therefore aren't yet in Unicode. I'd have to hope that I don't get script run breaks between base and combining mark. The subscript digits would go in their own grapheme clusters. Or am I wrong about them? However, encoding the Tai Tham column presents several issues, for the sequences with digits also occur subscript. A simple ad hoc solution is to treat the sequences as glyph variants of HIGH SHA and HIGH SSA. However, would that would comply with the Unicode standard? I think it would violate the character identities. A secondary question is the character identity of the subscript SA. As the writing style used does not seem to contrast the non-spacing tailless subscript SA and the spacing 'subscript' SA with an ascending tail, I believe the appropriate encoding for them would be rather than . Does anyone demur? For the fifth word, with usual transliteration _v?k?a_, I see several possibilities for the encodin? of the second syllable: A: . This recognises as a semantic unit, interpreting U+00B3 as a syllable modifier actin? as a spacing nukta. Implementations of USE will object to having a syllable modifier before a dependent vowel, but I'd just have to relax the conditions for deleted dotted circles. An issue with this scheme is that without a logically following vowel, I'd have to separate digit applying to the whole word, e.g. for a plain text endnote, I'd have to separate consonant and digit by at least ZWNJ. Under this solution, the second syllable of the first Tai Tham word in the table would be encoded . In plain text, the visually identical (or very similar) would be used to indicate that an endnote applied. B: . Compared to A, this solves the end note problem. However, it would look wrong if it were then rendered in a font that uses a rising tail for ! It also violates the character identity of U+00B3. C: This scheme respects USE's rule that syllable modifiers come at the end of the syllable. However, there is a problem of knowing which consonant to associate it with. For example, in the Sanskrit form of Vishnu written in this style, the final syllable would be , and one would need special rules to associate the digit with U+1A48 rather than with U+1A31. D: Encode special 'nukta' marks for spacing digit diacritics when there are problems with using plain spacing superscript and subscript digits. Overall, I think solution A is better than Solution D. What do others think? Richard. -------------- next part -------------- A non-text attachment was scrubbed... Name: numbered_letters_cropped.png Type: image/png Size: 39728 bytes Desc: not available URL: From unicode at unicode.org Sat Sep 23 01:48:03 2017 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Fri, 22 Sep 2017 23:48:03 -0700 Subject: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 In-Reply-To: References: <20170517134156.665a7a7059d7ee80bb4d670165c8327d.0fd4eb2a77.wbe@email03.godaddy.com> <1c37163e-08a2-a46f-c46c-223b5ea67bd1@it.aoyama.ac.jp> <79d4a0dc-db16-3105-98fe-8c0f83938fc2@ix.netcom.com> <2db88940-6eae-eb74-b70f-5d2772f3738d@khwilliamson.com> <0fc11646-7af3-5945-2faf-3cf6d47b5189@it.aoyama.ac.jp> <076d2114-dd6c-b45b-4159-07469451967f@khwilliamson.com> Message-ID: FYI, I changed the ICU behavior for the upcoming ICU 60 release (pending code review). Proposal & description: https://sourceforge.net/p/icu/mailman/message/35990833/ Code changes: http://bugs.icu-project.org/trac/review/13311 Best regards, markus On Thu, Aug 3, 2017 at 5:34 PM, Mark Davis ?? wrote: > FYI, the UTC retracted the following. > > *[151-C19 ] Consensus:* Modify > the section on "Best Practices for Using FFFD" in section "3.9 Encoding > Forms" of TUS per the recommendation in L2/17-168 > , for > Unicode version 11.0. > > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Sep 25 23:48:27 2017 From: unicode at unicode.org (Leo Broukhis via Unicode) Date: Mon, 25 Sep 2017 21:48:27 -0700 Subject: IBM 1620 invalid character symbol Message-ID: Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) describes the "invalid character" symbol (see attachment) as a Cyrillic ? which it obviously is not. But what is it? Does it deserve encoding, or is it a glyph variation of an existing codepoint? The question is somewhat prompted by 2BFF 1 HELLSCHREIBER PAUSE SYMBOL in the pipeline, although I learned about both earlier today within a few minutes of one another. Thanks, Leo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invalid.jpeg Type: image/jpeg Size: 15300 bytes Desc: not available URL: From unicode at unicode.org Tue Sep 26 00:34:24 2017 From: unicode at unicode.org (=?UTF-8?Q?Magnus_Bodin_=E2=98=80?= via Unicode) Date: Tue, 26 Sep 2017 07:34:24 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: It's like if IBM invented the tofu of some sort. (Well, this is something different but similiar) On Tue, Sep 26, 2017 at 6:48 AM, Leo Broukhis via Unicode < unicode at unicode.org> wrote: > Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) > describes the "invalid character" symbol (see attachment) as a Cyrillic ? > which it obviously is not. > > But what is it? Does it deserve encoding, or is it a glyph variation of an > existing codepoint? > > The question is somewhat prompted by > > 2BFF 1 HELLSCHREIBER PAUSE SYMBOL > > in the pipeline, although I learned about both earlier today within a few > minutes of one another. > > Thanks, > Leo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 00:48:26 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Mon, 25 Sep 2017 22:48:26 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> The 1620 manual accessed from the Wiki page shows the same information but with a different glyph (which looks more like the capital zhe, and is presumably the source of the glyph cited in the Wiki page itself). See: http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf p. 52 of the document (56/99 of the pdf). So there was some significant glyph variation in the 1620 documentation. My guess is that the invalid character tofu was implemented as an overprint symbol on the 1620 console typewriter (since the overlines and the strikethroughs clearly were). The whole system was basically using only a 50-character character set. But to verify exactly what was going on, somebody would presumably have to examine the physical keys of a 1620 console typewriter to see what they could generate on paper. I'm guessing the Computer History Museum ( http://www.computerhistory.org/ ) would have one sitting around. --Ken On 9/25/2017 9:48 PM, Leo Broukhis via Unicode wrote: > Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) > describes the "invalid character" symbol (see attachment) as a > Cyrillic ? which it obviously is not. > > But what is it? Does it deserve encoding, or is it a glyph variation > of an existing codepoint? > From unicode at unicode.org Tue Sep 26 01:22:25 2017 From: unicode at unicode.org (Leo Broukhis via Unicode) Date: Mon, 25 Sep 2017 23:22:25 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: On Mon, Sep 25, 2017 at 10:34 PM, Magnus Bodin ? wrote: > It's like if IBM invented the tofu of some sort. > Right. The question is, can it be considered a glyph variation of U+FFFF? On a tangent: graphically, the closest glyph which is not a letter appears to be ?? U+1F74F Alchemical Symbol for Scepter of Jove Its alchemical meaning is unclear. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 01:34:34 2017 From: unicode at unicode.org (Leo Broukhis via Unicode) Date: Mon, 25 Sep 2017 23:34:34 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> Message-ID: The glyph there looks more like U+1D219 Greek vocal notation symbol-51: http://shapecatcher.com/unicode/info/119321 than a ?. If it was implemented as an overprint, either )^H|^H( or \^H|^H/ and was intended to signify an invalid character (for example, in the text part of core dumps, where a period is used by hexdump -C), then there would not be a physical key to generate it. On Mon, Sep 25, 2017 at 10:48 PM, Ken Whistler via Unicode < unicode at unicode.org> wrote: > The 1620 manual accessed from the Wiki page shows the same information but > with a different glyph (which looks more like the capital zhe, and is > presumably the source of the glyph cited in the Wiki page itself). See: > > http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CP > U_Model_1_Jul65.pdf > > p. 52 of the document (56/99 of the pdf). > > So there was some significant glyph variation in the 1620 documentation. > My guess is that the invalid character tofu was implemented as an overprint > symbol on the 1620 console typewriter (since the overlines and the > strikethroughs clearly were). The whole system was basically using only a > 50-character character set. But to verify exactly what was going on, > somebody would presumably have to examine the physical keys of a 1620 > console typewriter to see what they could generate on paper. > > I'm guessing the Computer History Museum ( http://www.computerhistory.org/ > ) would have one sitting around. > > --Ken > > > > On 9/25/2017 9:48 PM, Leo Broukhis via Unicode wrote: > >> Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) >> describes the "invalid character" symbol (see attachment) as a Cyrillic ? >> which it obviously is not. >> >> But what is it? Does it deserve encoding, or is it a glyph variation of >> an existing codepoint? >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 02:27:03 2017 From: unicode at unicode.org (Karl Pentzlin via Unicode) Date: Tue, 26 Sep 2017 09:27:03 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: <451945285.20170926092703@acssoft.de> For me, the glyph looks like the proposed and accepted U+2BF0 ERIS FORM ONE (see pipeline; proposed as U+2BBA in L2/16-173R). - Karl -- Am Dienstag, 26. September 2017 um 06:48 schrieb Leo Broukhis via Unicode: >> Wikipedia >> (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) >> describes the "invalid character" symbol (see attachment) as a >> Cyrillic ? which it obviously is not.? >> But what is it? Does it deserve encoding, or is it a glyph >> variation of an existing codepoint? From unicode at unicode.org Tue Sep 26 02:47:27 2017 From: unicode at unicode.org (Leo Broukhis via Unicode) Date: Tue, 26 Sep 2017 00:47:27 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: <451945285.20170926092703@acssoft.de> References: <451945285.20170926092703@acssoft.de> Message-ID: On Tue, Sep 26, 2017 at 12:27 AM, Karl Pentzlin wrote: > For me, the glyph looks like the proposed and accepted U+2BF0 ERIS FORM ONE > (see pipeline; proposed as U+2BBA in L2/16-173R). > That's a perfect graphical match. I propose an annotation "Also an early IBM invalid character symbol". > - Karl > > -- > Am Dienstag, 26. September 2017 um 06:48 schrieb Leo Broukhis via Unicode: > > >> Wikipedia > >> (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) > >> describes the "invalid character" symbol (see attachment) as a > >> Cyrillic ? which it obviously is not. > > >> But what is it? Does it deserve encoding, or is it a glyph > >> variation of an existing codepoint? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 02:55:46 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 26 Sep 2017 09:55:46 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> Message-ID: I think it was simpler: "X^H|" 2017-09-26 8:34 GMT+02:00 Leo Broukhis via Unicode : > The glyph there looks more like U+1D219 Greek vocal notation symbol-51: > http://shapecatcher.com/unicode/info/119321 > than a ?. > > If it was implemented as an overprint, either )^H|^H( or \^H|^H/ and was > intended to signify an invalid character > (for example, in the text part of core dumps, where a period is used by > hexdump -C), then there would not be a physical key to generate it. > > > > > > On Mon, Sep 25, 2017 at 10:48 PM, Ken Whistler via Unicode < > unicode at unicode.org> wrote: > >> The 1620 manual accessed from the Wiki page shows the same information >> but with a different glyph (which looks more like the capital zhe, and is >> presumably the source of the glyph cited in the Wiki page itself). See: >> >> http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CP >> U_Model_1_Jul65.pdf >> >> p. 52 of the document (56/99 of the pdf). >> >> So there was some significant glyph variation in the 1620 documentation. >> My guess is that the invalid character tofu was implemented as an overprint >> symbol on the 1620 console typewriter (since the overlines and the >> strikethroughs clearly were). The whole system was basically using only a >> 50-character character set. But to verify exactly what was going on, >> somebody would presumably have to examine the physical keys of a 1620 >> console typewriter to see what they could generate on paper. >> >> I'm guessing the Computer History Museum ( http://www.computerhistory.org >> / ) would have one sitting around. >> >> --Ken >> >> >> >> On 9/25/2017 9:48 PM, Leo Broukhis via Unicode wrote: >> >>> Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) >>> describes the "invalid character" symbol (see attachment) as a Cyrillic ? >>> which it obviously is not. >>> >>> But what is it? Does it deserve encoding, or is it a glyph variation of >>> an existing codepoint? >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 08:03:00 2017 From: unicode at unicode.org (John W Kennedy via Unicode) Date: Tue, 26 Sep 2017 09:03:00 -0400 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> I don?t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct character, is another question. See http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf > On Sep 26, 2017, at 12:48 AM, Leo Broukhis via Unicode wrote: > > Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) describes the "invalid character" symbol (see attachment) as a Cyrillic ? which it obviously is not. > > But what is it? Does it deserve encoding, or is it a glyph variation of an existing codepoint? > > The question is somewhat prompted by > > 2BFF 1 HELLSCHREIBER PAUSE SYMBOL > > in the pipeline, although I learned about both earlier today within a few minutes of one another. > > Thanks, > Leo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 08:20:27 2017 From: unicode at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via Unicode) Date: Tue, 26 Sep 2017 22:20:27 +0900 Subject: IBM 1620 invalid character symbol In-Reply-To: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> Message-ID: <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> On 2017/09/26 22:03, John W Kennedy via Unicode wrote: > I don?t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct character, is another question. See http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf What page? Regards, Martin. From unicode at unicode.org Tue Sep 26 08:28:53 2017 From: unicode at unicode.org (=?UTF-8?Q?=22J=C3=B6rg_Knappen=22?= via Unicode) Date: Tue, 26 Sep 2017 15:28:53 +0200 Subject: Aw: Re: IBM 1620 invalid character symbol In-Reply-To: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 08:28:37 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 26 Sep 2017 15:28:37 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> Message-ID: This is what is printed in the manual by its editor that probably used metalic fonts, however I doubt the actual typewriter had this symbol on the wheel of hammers, and it was probably just overtriking the two letters X and I. 2017-09-26 15:03 GMT+02:00 John W Kennedy via Unicode : > I don?t know what your snippet is from, but the normally authoritative IBM > manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is > clearly the Cyrillic letter. Whether it should be regarded as that, or as a > distinct character, is another question. See http://www.bitsavers.org/ > pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf > > > > On Sep 26, 2017, at 12:48 AM, Leo Broukhis via Unicode < > unicode at unicode.org> wrote: > > Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) > describes the "invalid character" symbol (see attachment) as a Cyrillic ? > which it obviously is not. > > But what is it? Does it deserve encoding, or is it a glyph variation of an > existing codepoint? > > The question is somewhat prompted by > > 2BFF 1 HELLSCHREIBER PAUSE SYMBOL > > in the pipeline, although I learned about both earlier today within a few > minutes of one another. > > Thanks, > Leo > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 08:34:15 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 26 Sep 2017 15:34:15 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> Message-ID: But what is interesting is the use of negative digits (-1 to -9, with the minus sign above the digit; I've not seen a case of minus 0, not needed apparently by the described operations) How do you encode these negative decimal digits in Unicode ? with a macron diacritic ? 2017-09-26 15:20 GMT+02:00 Martin J. D?rst via Unicode : > On 2017/09/26 22:03, John W Kennedy via Unicode wrote: > >> I don?t know what your snippet is from, but the normally authoritative >> IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is >> clearly the Cyrillic letter. Whether it should be regarded as that, or as a >> distinct character, is another question. See >> http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CP >> U_Model_1_Jul65.pdf >> > > What page? > > Regards, Martin. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 09:03:52 2017 From: unicode at unicode.org (Karl Pentzlin via Unicode) Date: Tue, 26 Sep 2017 16:03:52 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> Message-ID: <206583734.20170926160352@acssoft.de> Thank you for the information. I attach an enlarged part of the scan which in my eyes shows a specific design at least for the printed text (maybe inspired by " } ^H | ^H { "). - Karl -- Am Dienstag, 26. September 2017 um 15:28 schrieb J?rg Knappen via Unicode: >> I found the character in question on p. 52, it is a picture of >> something handwritten, not a typeset character. "Clearly" means something different to me. >> >> --J?rg Knappen >> >> Gesendet: Dienstag, 26. September 2017 um 15:03 Uhr >> Von: "John W Kennedy via Unicode" >> An: "Leo Broukhis" , unicode at unicode.org >> Betreff: Re: IBM 1620 invalid character symbol >> I don?t know what your snippet is from, but the normally >> authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 >> (July, 1965) displays what is clearly the Cyrillic letter. >> Whether it should be regarded as that, or as a distinct >> character, is another question. See >> http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf >> >> On Sep 26, 2017, at 12:48 AM, Leo Broukhis via Unicode wrote: >> >> Wikipedia >> (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) >> describes the "invalid character" symbol (see attachment) as a >> Cyrillic ? which it obviously is not. >> >> But what is it? Does it deserve encoding, or is it a glyph >> variation of an existing codepoint? >> >> The question is somewhat prompted by >> 2BFF >> 1 HELLSCHREIBER PAUSE SYMBOL >> >> in the pipeline, although I learned about both earlier today >> within a few minutes of one another. >> >> Thanks, >> Leo >> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: ibm3.png Type: image/png Size: 6276 bytes Desc: not available URL: From unicode at unicode.org Tue Sep 26 09:09:41 2017 From: unicode at unicode.org (John W Kennedy via Unicode) Date: Tue, 26 Sep 2017 10:09:41 -0400 Subject: IBM 1620 invalid character symbol In-Reply-To: <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> Message-ID: <48F6122D-DA9D-4E64-91E3-08B934169E67@gmail.com> The 56th page in the PDF, numbered 52. -- SKen Software, LLC Coming soon to an iPhone near you > On Sep 26, 2017, at 9:20 AM, Martin J. D?rst wrote: > >> On 2017/09/26 22:03, John W Kennedy via Unicode wrote: >> I don?t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct character, is another question. See http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf > > What page? > > Regards, Martin. From unicode at unicode.org Tue Sep 26 10:45:19 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Tue, 26 Sep 2017 08:45:19 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> Message-ID: <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> Leo, Yeah, I know. My point was that by examining the physical typewriter keys (the striking head on the typebar, not the images on the keypads), one could see what could be generated *by* overstriking. I think Philippe's suggestion that it was simply an overstrike of "X" with an "I" is probably the simplest explanation for the actual operation. And the typeset manuals just grabbed some type that looked similar. Note that the typewriters in question didn't have a vertical bar or backslash, apparently. But adding an annotation for similar-looking symbols that could be used for this is, I agree, probably better than looking for a proposal to encode some new symbol for this oddball construction. If it really is an overstrike, then technically, it could probably also be represented as the sequence <0058, 20D2>, just to represent the data. --Ken On 9/25/2017 11:34 PM, Leo Broukhis wrote: > If it was implemented as an overprint, either )^H|^H( or \^H|^H/ and > was intended to signify an invalid character > (for example, in the text part of core dumps, where a period is used > by hexdump -C), then there would not be a physical key to generate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 10:53:50 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Tue, 26 Sep 2017 08:53:50 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> Message-ID: Philippe, Those aren't negative digits, per se. The usage in the manual is with an overline (or macron) to indicate the flag bit. It does occur over a zero, and in explanation in the text of floating point operations, it is also shown over letters (X, M, E) representing digits of the exponent and mantissa. See p. 27 (31 of the pdf) in that same manual, for an extensive discussion with lots of examples in the text: http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf The Unicode representation of the text material printed on that page would best be done with a combining macron, I think. --Ken On 9/26/2017 6:34 AM, Philippe Verdy via Unicode wrote: > But what is interesting is the use of negative digits (-1 to -9, with > the minus sign above the digit; I've not seen a case of minus 0, not > needed apparently by the described operations) > How do you encode these negative decimal digits in Unicode ? with a > macron diacritic ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 08:56:04 2017 From: unicode at unicode.org (William_J_G Overington via Unicode) Date: Tue, 26 Sep 2017 14:56:04 +0100 (BST) Subject: IBM 1620 invalid character symbol In-Reply-To: References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> Message-ID: <16851652.42640.1506434164882.JavaMail.defaultUser@defaultHost> A digit with a bar over the top is used to express the common logarithm of a number that is both greater than zero and also less than one. https://en.wikipedia.org/wiki/Common_logarithm William Overington Tuesday 26 September 2017 ----Original message---- >From : unicode at unicode.org Date : 2017/09/26 - 14:34 (GMTST) To : duerst at it.aoyama.ac.jp Cc : unicode at unicode.org, john.w.kennedy at gmail.com, leob at mailcom.com Subject : Re: IBM 1620 invalid character symbol But what is interesting is the use of negative digits (-1 to -9, with the minus sign above the digit; I've not seen a case of minus 0, not needed apparently by the described operations) How do you encode these negative decimal digits in Unicode ? with a macron diacritic ? 2017-09-26 15:20 GMT+02:00 Martin J. D?rst via Unicode : On 2017/09/26 22:03, John W Kennedy via Unicode wrote: I don?t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct character, is another question. See http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf What page? Regards, Martin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 11:23:56 2017 From: unicode at unicode.org (Ian Clifton via Unicode) Date: Tue, 26 Sep 2017 16:23:56 +0000 Subject: IBM 1620 invalid character symbol In-Reply-To: <16851652.42640.1506434164882.JavaMail.defaultUser@defaultHost> References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> , <16851652.42640.1506434164882.JavaMail.defaultUser@defaultHost> Message-ID: William Overington wrote: > A digit with a bar over the top is used to express the common logarithm of a number that is both greater > than zero and also less than one. > https://en.wikipedia.org/wiki/Common_logarithm Gosh, I?d forgotten that usage, although I now remember being taught it at school. Another use of over bar for negative numbers is in crystallography?I believe the motivation is that it?s useful to have a very compact notation for (often space separated triples of) small integers of either sign. -- Ian Clifton From unicode at unicode.org Tue Sep 26 13:24:21 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 26 Sep 2017 20:24:21 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <4C6D774A-C892-456A-BC4C-7D75E7C4735F@gmail.com> <083e66e8-0dfe-52bc-52d7-54ffaedace10@it.aoyama.ac.jp> Message-ID: The doc designate those characters as negative digits. They are used during numeric processing as well and then refered to as "-1".. "-9" and explcitly says it is a negative sign 2017-09-26 17:53 GMT+02:00 Ken Whistler : > Philippe, > > Those aren't negative digits, per se. The usage in the manual is with an > overline (or macron) to indicate the flag bit. It does occur over a zero, > and in explanation in the text of floating point operations, it is also > shown over letters (X, M, E) representing digits of the exponent and > mantissa. See p. 27 (31 of the pdf) in that same manual, for an extensive > discussion with lots of examples in the text: > > http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_ > CPU_Model_1_Jul65.pdf > > The Unicode representation of the text material printed on that page would > best be done with a combining macron, I think. > > --Ken > > On 9/26/2017 6:34 AM, Philippe Verdy via Unicode wrote: > > But what is interesting is the use of negative digits (-1 to -9, with the > minus sign above the digit; I've not seen a case of minus 0, not needed > apparently by the described operations) > How do you encode these negative decimal digits in Unicode ? with a macron > diacritic ? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 13:52:10 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 26 Sep 2017 20:52:10 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> Message-ID: 2017-09-26 17:45 GMT+02:00 Ken Whistler via Unicode : > Leo, > > Yeah, I know. My point was that by examining the physical typewriter keys > (the striking head on the typebar, not the images on the keypads), one > could see what could be generated *by* overstriking. I think Philippe's > suggestion that it was simply an overstrike of "X" with an "I" is probably > the simplest explanation for the actual operation. And the typeset manuals > just grabbed some type that looked similar. Note that the typewriters in > question didn't have a vertical bar or backslash, apparently. > > But adding an annotation for similar-looking symbols that could be used > for this is, I agree, probably better than looking for a proposal to encode > some new symbol for this oddball construction. > > If it really is an overstrike, then technically, it could probably also be > represented as the sequence <0058, 20D2>, just to represent the data. > > --Ken > Many old computers have used the overstriking (non-advancing) X to cancel some text. I bet that computer had the necessary printer control to lock the paper position when striking the X just before striking the I. On some terminals you could cancel the previous printed character by emitting a CANCEL or DELETE control and it would overstrike an X over the previous letter (remembering that these typewriters had only fixed-width characters). But this xcould have been slow if the carriage was not locked (look at the ugly metallic bar on the right which is there to avoid the carriage to get out of rails and guide it, the right-ward force was certainly quite strong and had to be compensated, but a emitting backspace would have resulted in very slow printing and thre must have been a way to keep the cariage locked to avoid advancing when striking the first letter before the next advancing letter. Yes there was no vertical bar on the keyboard, and only capital letters (so this cannot be a lowercase L). But it had a distinctive asterisk and there was certainly a need for distinction. I also bet that the two other symbols with the vertical bar and single or double horizontal bar (for data separators) were printed as well by the typewriter as overstrikes (capital letter I plus equal sign, or capital letter I and "dash", and note that this is named "dash" in the manual, not specifically "hyphen" or "minus"). At that time the precision of character encopding was not the goal. If that looked similar enough it was good enough. If there was a card puncher, and the cards were not only punched but also printed for reading by humans, apparently it was already using with a dot printer and the characters look a bit different (we can clearly see how the asterisk looks like). However the cards shown may have been produced by another device. The encoding used (based on IBM punched cards) was also a (very incomplete) precursor of the EBCDIC encoding used later. It is fascinating how these machines could perform any arithmetic using a lookup table even for basic additions (the lookup we probably necessary due to the complex and ugly encoding and the way it was handling digits on 6 bits (4 BCD plus an additional sign bit and a flag bit used in intermediate computing steps). But how could they even compile a Fortran program ? Probably the programs were compiled on a more powerful machine to generate the assembly code transfered to the machine using a card reader or paper band. Anyway these machines were certainly complex to handle as the operator had to know the numeric assembly code to perform basic functions or preparation of the machine, or could easily corrupt the program, and had to remember some numeric addresses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Sep 26 23:00:33 2017 From: unicode at unicode.org (Leo Broukhis via Unicode) Date: Tue, 26 Sep 2017 21:00:33 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> Message-ID: Ken, The next time I'm at the Mountain View CHM, I'll try to ask. However, assuming it was an overstrike of an X and an I, then where does the "Eris"-like glyph come from? Was there ever an IBM font with a double-semicircular X like )( ? On Tue, Sep 26, 2017 at 8:45 AM, Ken Whistler wrote: > Leo, > > Yeah, I know. My point was that by examining the physical typewriter keys > (the striking head on the typebar, not the images on the keypads), one > could see what could be generated *by* overstriking. I think Philippe's > suggestion that it was simply an overstrike of "X" with an "I" is probably > the simplest explanation for the actual operation. And the typeset manuals > just grabbed some type that looked similar. Note that the typewriters in > question didn't have a vertical bar or backslash, apparently. > > But adding an annotation for similar-looking symbols that could be used > for this is, I agree, probably better than looking for a proposal to encode > some new symbol for this oddball construction. > > If it really is an overstrike, then technically, it could probably also be > represented as the sequence <0058, 20D2>, just to represent the data. > > --Ken > > On 9/25/2017 11:34 PM, Leo Broukhis wrote: > > If it was implemented as an overprint, either )^H|^H( or \^H|^H/ and was > intended to signify an invalid character > (for example, in the text part of core dumps, where a period is used by > hexdump -C), then there would not be a physical key to generate it. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 11:32:54 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Wed, 27 Sep 2017 09:32:54 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> Message-ID: <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> Leo, On 9/26/2017 9:00 PM, Leo Broukhis via Unicode wrote: > The next time I'm at the Mountain View CHM, I'll try to ask. However, > assuming it was an overstrike of an X and an I, then where does the > "Eris"-like glyph come from? Was there ever an IBM font with a > double-semicircular X like )( ? > The reason for focusing on the hardware is that during operation of an IBM 1620, that is what would have been printed on paper by the actual machines, and what people would have seen? in core dumps, or whatever. The question of what was printed in the *documentation* is a different issue, really. That involves figuring out what the editors/typesetters of the manuals were doing to represent a symbol generated by overstriking by the hardware, for which they had no convenient type to use, by whatever word processing and printing technology they were using circa 1959. I suspect that both the "Zhe"-like glyph and the "Eris"-like glyph we have seen in the printed copies of the manual are themselves typesetter substituted glyphs for whatever the 1620 tofu glyph was that they were trying to represent. Where they got those glyphs, I dunno -- and it might be pretty difficult to track down, because almost all the folks who would have known what IBM manual typesetting practices were circa 1959 will have passed on by now. I don't know of any *standard* IBM glyph for this "Eris"-like thingie seen in the scanned bit of manual that started this thread -- but my documentation is from the 1980's era listings of standardized glyph identifiers. Who knows what was going on circa 1959, which predated most of the IBM efforts to standardize large glyph sets and large numbers of character sets? Back then, "fonts" consisted of what were cast on the typebars of typewriters, or on the strikers of line printers, or the physical type that typesetters used. Look at the archival pictures of the IBM 1620. Do you see any display font anywhere? That console is a Star-Trek style computer console -- all register lights and bit switches and rows of power station style light-up buttons. Not a font anywhere. The only font on that machine can be found by feeling the key strikers in the typewriter. --Ken From unicode at unicode.org Wed Sep 27 12:02:57 2017 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Wed, 27 Sep 2017 10:02:57 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 12:42:54 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Wed, 27 Sep 2017 10:42:54 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> Message-ID: <38ece121-10aa-27d5-64a7-d2f3163bf466@att.net> Asmus, On 9/27/2017 10:02 AM, Asmus Freytag via Unicode wrote: > > In that context it's worth remembering that there while you could say > for most typewriters that "the typewriter is the font", there were > noted exceptions. The IBM Selectric, for example, had exchangeable > type balls which allowed both a font and / or encoding change. > (Encoding understood here as association of character to key). > > That technology was then only two years in the future. > And in some sense, not even... ;-) By the 1950's (and probably earlier), enterprising linguists and other special users were conspiring with skilled typewriter repair experts to customize their manual typewriter keyboards and key strikers with custom fonts. I have an example sitting in my office -- an old Olympia manual typewriter with custom-cast type replacing the standard punches on some of the key strikers, and with custom engraved key caps added to the keyboard, to add schwa, eng, open-o, etc. to the typewriter. It also has the bottom dot of the colon *filed off* to create a middle dot key. Typing an actual colon on that machine requires an "input method" consisting of 3 key presses: {period, backspace, middledot} A couple of the keys that have raised accents on them were modified so as disable the platen advance, thereby becoming permanent "dead keys" -- effectively emulating the encoding of combining marks. There are probably thousands of such customized manual typewriters still sitting around, over and beyond the various standard manufactured models. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 13:10:33 2017 From: unicode at unicode.org (Ken Shirriff via Unicode) Date: Wed, 27 Sep 2017 11:10:33 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: <38ece121-10aa-27d5-64a7-d2f3163bf466@att.net> References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> <38ece121-10aa-27d5-64a7-d2f3163bf466@att.net> Message-ID: The IBM type catalog might be of interest. It describes in great detail the character sets of the IBM typewriters and line printers and the custom characters that can be ordered for printer chains and Selectric type balls. Link: http://bitsavers.org/pdf/ibm/serviceForConsultants/Service_For_Consultants_198312_Complete/15_Type_Catalog.pdf I'm asking my sources to see if I can find out more about the 1620's characters, but haven't come up with anything concrete yet. Ken On Wed, Sep 27, 2017 at 10:42 AM, Ken Whistler via Unicode < unicode at unicode.org> wrote: > Asmus, > On 9/27/2017 10:02 AM, Asmus Freytag via Unicode wrote: > > In that context it's worth remembering that there while you could say for > most typewriters that "the typewriter is the font", there were noted > exceptions. The IBM Selectric, for example, had exchangeable type balls > which allowed both a font and / or encoding change. (Encoding understood > here as association of character to key). > > That technology was then only two years in the future. > > > And in some sense, not even... ;-) > > By the 1950's (and probably earlier), enterprising linguists and other > special users were conspiring with skilled typewriter repair experts to > customize their manual typewriter keyboards and key strikers with custom > fonts. I have an example sitting in my office -- an old Olympia manual > typewriter with custom-cast type replacing the standard punches on some of > the key strikers, and with custom engraved key caps added to the keyboard, > to add schwa, eng, open-o, etc. to the typewriter. It also has the bottom > dot of the colon *filed off* to create a middle dot key. Typing an actual > colon on that machine requires an "input method" consisting of 3 key > presses: {period, backspace, middledot} A couple of the keys that have > raised accents on them were modified so as disable the platen advance, > thereby becoming permanent "dead keys" -- effectively emulating the > encoding of combining marks. There are probably thousands of such > customized manual typewriters still sitting around, over and beyond the > various standard manufactured models. > > --Ken > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 13:24:46 2017 From: unicode at unicode.org (John W Kennedy via Unicode) Date: Wed, 27 Sep 2017 14:24:46 -0400 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> Message-ID: <305319A1-A8D5-4009-8179-60E8702E4CC4@gmail.com> Indeed, the later 1620-2 was equipped with a Selectric, which probably has something to do with the fact that the ?-like character was replaced on that model by the ?pillow? character (which doesn?t seem to be available in Unicode at all). > On Sep 27, 2017, at 1:02 PM, Asmus Freytag via Unicode wrote: > > On 9/27/2017 9:32 AM, Ken Whistler via Unicode wrote: >> The only font on that machine can be found by feeling the key strikers in the typewriter. > In that context it's worth remembering that there while you could say for most typewriters that "the typewriter is the font", there were noted exceptions. The IBM Selectric, for example, had exchangeable type balls which allowed both a font and / or encoding change. (Encoding understood here as association of character to key). > > That technology was then only two years in the future. > > Other typewriters used interchangeable type wheels for the same purpose, but I believe that generally came later. > > A./ -- John W Kennedy "Harriet thanked Heaven, with grim amusement, for the scholarly habit; at least, one did not have to argue about what was or was not evidence." -- Dorothy L. Sayers: "Gaudy Night" From unicode at unicode.org Wed Sep 27 13:53:47 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Wed, 27 Sep 2017 11:53:47 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> <38ece121-10aa-27d5-64a7-d2f3163bf466@att.net> Message-ID: Ken, On 9/27/2017 11:10 AM, Ken Shirriff via Unicode wrote: > The IBM type catalog might be of interest. It describes in great > detail the character sets of the IBM typewriters and line printers and > the custom characters that can be ordered for printer chains and > Selectric type balls. Link: > http://bitsavers.org/pdf/ibm/serviceForConsultants/Service_For_Consultants_198312_Complete/15_Type_Catalog.pdf > > That is a very interesting source, though from a much later era (1983). In particular, the "Special Character Nomenclature" (p. 11 of the pdf) provides a good list of what the IBM typographers at the time thought was the range of special symbols they were working within this overall collection. Note the presence of the group mark, the record mark, and the segment mark. And in the realm of potential "tofu" indicators, there is the open box and the OCR blob, but nothing like the 1620 symbol(s) we've been talking about. On another point, the "pillow" noted for the invalid character in the IBM 1620-2 (using the Selectric instead of the older IBM typewriter model) was almost certainly also not an actual punch on the Selectric type ball, but instead implemented by an overstrike of "[" and "]". See, e.g., the Pica 72 type style in the catalog noted above, which looks like some of the very earliest Selectric type. Its use could well have been occasioned by the fact that the slab serif typewriter font would have created a muddy blob if you tried to overstrike an "X" and and "I" for this output symbol. --Ken From unicode at unicode.org Wed Sep 27 15:36:55 2017 From: unicode at unicode.org (Ken Shirriff via Unicode) Date: Wed, 27 Sep 2017 13:36:55 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: I checked with the Computer History Museum about the 1620. According to Dave Babcock, IBM 1620 Restoration Team Lead at the CHM: The 1620 console typewriter actually had a "zha" character typebar that it would use for unknown characters. The only overprinting that the typewriter would do was a "flag" mark [an "overscore" rather than an "underscore"] and a center-hyphen [used for characters with bad parity]. For both of these, it would first print the special character [flag or center-hyphen] without advancing the carriage, then print the other digit or alpha character. And yes, it was possible to get a "bad parity unknown character" which would print the center-hyphen and zha. The typewriter was not capable of backspacing to do any other overprinting. With the Wheelwriter-based console typewriter that we're using for the IBM 1620 Jr. we will be doing some real print-backspace-print overprinting to approximate some of the special characters, like zha. Ken Shirriff (the other Ken) On Mon, Sep 25, 2017 at 9:48 PM, Leo Broukhis via Unicode < unicode at unicode.org> wrote: > Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) > describes the "invalid character" symbol (see attachment) as a Cyrillic ? > which it obviously is not. > > But what is it? Does it deserve encoding, or is it a glyph variation of an > existing codepoint? > > The question is somewhat prompted by > > 2BFF 1 HELLSCHREIBER PAUSE SYMBOL > > in the pipeline, although I learned about both earlier today within a few > minutes of one another. > > Thanks, > Leo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 15:49:35 2017 From: unicode at unicode.org (James Tauber via Unicode) Date: Wed, 27 Sep 2017 16:49:35 -0400 Subject: implicit weight base for U+2CEA2 Message-ID: I recently updated pyuca[1], my pure Python implementation of the Unicode Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all the tests to work, I had to special case the implicit weight base for U+2CEA2. The spec seems to suggest the base should be FB80 but I had to override just that code point to have a base of FBC0 for the tests to pass. Is this a known issue with the spec or something I've missed? James [1] https://github.com/jtauber/pyuca -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 16:08:15 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Wed, 27 Sep 2017 23:08:15 +0200 Subject: IBM 1620 invalid character symbol In-Reply-To: References: <1ef6682e-c8f3-d570-d1c0-b684a66f6474@att.net> <047f2f71-14e6-f939-002c-c0cddb81aada@att.net> <7c14c57f-ba3d-5ddd-4124-a0905b6eef11@att.net> Message-ID: But it is not the case for this early computer, whose typewriter terminal is clearly using non-interchangeable font balls but old metalic type on a "wheel of hammers". It's clearly also that this is not that typerwriter (described in the munalk) that was used to typeset the manual using more conventional typophaphic tools used by book editors. So we can't compare what is in the manual with what was actually printed (as described in the manual). 2017-09-27 19:02 GMT+02:00 Asmus Freytag via Unicode : > On 9/27/2017 9:32 AM, Ken Whistler via Unicode wrote: > > The only font on that machine can be found by feeling the key strikers in > the typewriter. > > In that context it's worth remembering that there while you could say for > most typewriters that "the typewriter is the font", there were noted > exceptions. The IBM Selectric, for example, had exchangeable type balls > which allowed both a font and / or encoding change. (Encoding understood > here as association of character to key). > > That technology was then only two years in the future. > > Other typewriters used interchangeable type wheels for the same purpose, > but I believe that generally came later. > > A./ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 16:19:56 2017 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Wed, 27 Sep 2017 14:19:56 -0700 Subject: implicit weight base for U+2CEA2 In-Reply-To: References: Message-ID: On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode < unicode at unicode.org> wrote: > I recently updated pyuca[1], my pure Python implementation of the Unicode > Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all > the tests to work, I had to special case the implicit weight base for > U+2CEA2. The spec seems to suggest the base should be FB80 but I had to > override just that code point to have a base of FBC0 for the tests to pass. > > Is this a known issue with the spec or something I've missed? > 2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses a base of FBC0. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 16:29:38 2017 From: unicode at unicode.org (Ken Whistler via Unicode) Date: Wed, 27 Sep 2017 14:29:38 -0700 Subject: implicit weight base for U+2CEA2 In-Reply-To: References: Message-ID: <651c8934-4d44-f46d-17a2-cf64113763de@att.net> On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote: > On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode > > wrote: > > I recently updated pyuca[1], my pure Python implementation of the > Unicode Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 > but to get all the tests to work, I had to special case the > implicit weight base for U+2CEA2. The spec seems to suggest the > base should be FB80 but I had to override just that code point to > have a base of FBC0 for the tests to pass. > > Is this a known issue with the spec or something I've missed? > > > 2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses a > base of FBC0. > > markus And you may have a range error in Extension E to account for the test problem. The relevant section of CollationTest_SHIFTED_SHORT.txt has tests that will pass only if: 2B735 < 2B81E < 2CEA2 < 2EBE1 < 2FFFE Ext C< Ext D < Ext E < Ext F < non-character Those are *unassigned* characters just past the assigned ranges but still in the blocks in each of those CJK extensions. So if you have a range error for assigned characters in Extension E, you'd get a failure at that point in the text cases. --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 17:48:17 2017 From: unicode at unicode.org (Ken Shirriff via Unicode) Date: Wed, 27 Sep 2017 15:48:17 -0700 Subject: IBM 1620 invalid character symbol In-Reply-To: References: Message-ID: More information from Tim Coslet of the Computer History Museum 1620 Team: The Model I printed a Cyrillic ? for invalid character codes. The width of the Cyrillic ? was narrower than shown at left, so that it matched the width of other characters the typewriter typed. The Model II printed a character called "pillow" for invalid character codes. "Pillow" was a solid black rectangle. Ken Shirriff (the other Ken) On Wed, Sep 27, 2017 at 1:36 PM, Ken Shirriff wrote: > I checked with the Computer History Museum about the 1620. According to Dave > Babcock, IBM 1620 Restoration Team Lead at the CHM: > > The 1620 console typewriter actually had a "zha" character typebar that > it would use for unknown characters. > > The only overprinting that the typewriter would do was a "flag" mark [an > "overscore" rather than an "underscore"] and a center-hyphen [used for > characters with bad parity]. For both of these, it would first print > the special character [flag or center-hyphen] without advancing the > carriage, then print the other digit or alpha character. > > And yes, it was possible to get a "bad parity unknown character" which > would print the center-hyphen and zha. > > The typewriter was not capable of backspacing to do any other overprinting. > > With the Wheelwriter-based console typewriter that we're using for the > IBM 1620 Jr. we will be doing some real print-backspace-print > overprinting to approximate some of the special characters, like zha. > > Ken Shirriff (the other Ken) > > On Mon, Sep 25, 2017 at 9:48 PM, Leo Broukhis via Unicode < > unicode at unicode.org> wrote: > >> Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) >> describes the "invalid character" symbol (see attachment) as a Cyrillic ? >> which it obviously is not. >> >> But what is it? Does it deserve encoding, or is it a glyph variation of >> an existing codepoint? >> >> The question is somewhat prompted by >> >> 2BFF 1 HELLSCHREIBER PAUSE SYMBOL >> >> in the pipeline, although I learned about both earlier today within a few >> minutes of one another. >> >> Thanks, >> Leo >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 18:07:00 2017 From: unicode at unicode.org (James Tauber via Unicode) Date: Wed, 27 Sep 2017 19:07:00 -0400 Subject: implicit weight base for U+2CEA2 In-Reply-To: <651c8934-4d44-f46d-17a2-cf64113763de@att.net> References: <651c8934-4d44-f46d-17a2-cf64113763de@att.net> Message-ID: Ah yes, I was just going by membership in the CJK Unified Ideographs Extension E block, not actual assignment. So the lack of assignment means it should fail the Unified_Ideograph membership in http://unicode.org/reports/tr10/#Values_For_Base_Table Got it! Thanks James On Wed, Sep 27, 2017 at 5:29 PM, Ken Whistler via Unicode < unicode at unicode.org> wrote: > > > On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote: > > On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode < > unicode at unicode.org> wrote: > >> I recently updated pyuca[1], my pure Python implementation of the Unicode >> Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all >> the tests to work, I had to special case the implicit weight base for >> U+2CEA2. The spec seems to suggest the base should be FB80 but I had to >> override just that code point to have a base of FBC0 for the tests to pass. >> >> Is this a known issue with the spec or something I've missed? >> > > 2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses a > base of FBC0. > > markus > > > And you may have a range error in Extension E to account for the test > problem. > > The relevant section of CollationTest_SHIFTED_SHORT.txt has tests that > will pass only if: > > 2B735 < 2B81E < 2CEA2 < 2EBE1 < 2FFFE > Ext C < Ext D < Ext E < Ext F < non-character > > Those are *unassigned* characters just past the assigned ranges but still > in the blocks in each of those CJK extensions. So if you have a range error > for assigned characters in Extension E, you'd get a failure at that point > in the text cases. > > --Ken > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Wed Sep 27 18:54:22 2017 From: unicode at unicode.org (Markus Scherer via Unicode) Date: Wed, 27 Sep 2017 16:54:22 -0700 Subject: implicit weight base for U+2CEA2 In-Reply-To: References: <651c8934-4d44-f46d-17a2-cf64113763de@att.net> Message-ID: On Wed, Sep 27, 2017 at 4:07 PM, James Tauber wrote: > Ah yes, I was just going by membership in the CJK Unified Ideographs > Extension E block, not actual assignment. > > So the lack of assignment means it should fail the Unified_Ideograph > membership in http://unicode.org/reports/tr10/#Values_For_Base_Table > Right. http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt 3400..4DB5 ; Unified_Ideograph # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5 4E00..9FEA ; Unified_Ideograph # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA FA0E..FA0F ; Unified_Ideograph # Lo [2] CJK COMPATIBILITY IDEOGRAPH-FA0E..CJK COMPATIBILITY IDEOGRAPH-FA0F FA11 ; Unified_Ideograph # Lo CJK COMPATIBILITY IDEOGRAPH-FA11 FA13..FA14 ; Unified_Ideograph # Lo [2] CJK COMPATIBILITY IDEOGRAPH-FA13..CJK COMPATIBILITY IDEOGRAPH-FA14 FA1F ; Unified_Ideograph # Lo CJK COMPATIBILITY IDEOGRAPH-FA1F FA21 ; Unified_Ideograph # Lo CJK COMPATIBILITY IDEOGRAPH-FA21 FA23..FA24 ; Unified_Ideograph # Lo [2] CJK COMPATIBILITY IDEOGRAPH-FA23..CJK COMPATIBILITY IDEOGRAPH-FA24 FA27..FA29 ; Unified_Ideograph # Lo [3] CJK COMPATIBILITY IDEOGRAPH-FA27..CJK COMPATIBILITY IDEOGRAPH-FA29 20000..2A6D6 ; Unified_Ideograph # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6 2A700..2B734 ; Unified_Ideograph # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734 2B740..2B81D ; Unified_Ideograph # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; Unified_Ideograph # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; Unified_Ideograph # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 # Total code points: 87882 https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AUnified_Ideograph%3A%5D&abb=on&g=&i= markus -------------- next part -------------- An HTML attachment was scrubbed... URL: