From unicode at unicode.org Sun Dec 2 03:33:37 2018 From: unicode at unicode.org (=?utf-8?Q?Hans_=C3=85berg?= via Unicode) Date: Sun, 2 Dec 2018 10:33:37 +0100 Subject: A sign/abbreviation for "magister" In-Reply-To: References: <20181030105122.665a7a7059d7ee80bb4d670165c8327d.6605d392f3.wbe@email03.godaddy.com> Message-ID: <031F097F-1FA9-4439-A477-CFD229E00BCC@telia.com> > On 30 Oct 2018, at 22:50, Ken Whistler via Unicode wrote: > > On 10/30/2018 2:32 PM, James Kass via Unicode wrote: >> but we can't seem to agree on how to encode its abbreviation. > > For what it's worth, "mgr" seems to be the usual abbreviation in Polish for it. It was common in the 1800s to singly and doubly underline superscript abbreviations in handwriting according to [1-2], and [2] also mentions the abbreviation discussed in this thread. 1. https://en.wikipedia.org/wiki/Ordinal_indicator 2. https://en.wikipedia.org/wiki/Ordinal_indicator#cite_note-1 From unicode at unicode.org Sun Dec 2 13:29:31 2018 From: unicode at unicode.org (Janusz S. =?utf-8?Q?Bie=C5=84?= via Unicode) Date: Sun, 02 Dec 2018 20:29:31 +0100 Subject: Update to the second question summary (was: A sign/abbreviation for "magister") In-Reply-To: <031F097F-1FA9-4439-A477-CFD229E00BCC@telia.com> ("Hans =?utf-8?Q?=C3=85berg?= via Unicode"'s message of "Sun, 2 Dec 2018 10:33:37 +0100") References: <20181030105122.665a7a7059d7ee80bb4d670165c8327d.6605d392f3.wbe@email03.godaddy.com> <031F097F-1FA9-4439-A477-CFD229E00BCC@telia.com> Message-ID: <86pnujk9r8.fsf_-_@mimuw.edu.pl> On Sun, Dec 02 2018 at 10:33 +0100, Hans ?berg via Unicode wrote: >> On 30 Oct 2018, at 22:50, Ken Whistler via Unicode wrote: >> >> On 10/30/2018 2:32 PM, James Kass via Unicode wrote: >>> but we can't seem to agree on how to encode its abbreviation. >> >> For what it's worth, "mgr" seems to be the usual abbreviation in Polish for it. > > It was common in the 1800s to singly and doubly underline superscript > abbreviations in handwriting according to [1-2], and [2] also mentions > the abbreviation discussed in this thread. Thank you very much for this reference to the very abbreviation! I looked up Wikipedia but I haven't read it carefully enough :-( > > 1. https://en.wikipedia.org/wiki/Ordinal_indicator > 2. https://en.wikipedia.org/wiki/Ordinal_indicator#cite_note-1 Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien From unicode at unicode.org Sun Dec 2 15:52:35 2018 From: unicode at unicode.org (=?utf-8?Q?Hans_=C3=85berg?= via Unicode) Date: Sun, 2 Dec 2018 22:52:35 +0100 Subject: Update to the second question summary (was: A sign/abbreviation for "magister") In-Reply-To: <86pnujk9r8.fsf_-_@mimuw.edu.pl> References: <20181030105122.665a7a7059d7ee80bb4d670165c8327d.6605d392f3.wbe@email03.godaddy.com> <031F097F-1FA9-4439-A477-CFD229E00BCC@telia.com> <86pnujk9r8.fsf_-_@mimuw.edu.pl> Message-ID: > On 2 Dec 2018, at 20:29, Janusz S. Bie? via Unicode wrote: > > On Sun, Dec 02 2018 at 10:33 +0100, Hans ?berg via Unicode wrote: >> >> It was common in the 1800s to singly and doubly underline superscript >> abbreviations in handwriting according to [1-2], and [2] also mentions >> the abbreviation discussed in this thread. > > Thank you very much for this reference to the very abbreviation! I > looked up Wikipedia but I haven't read it carefully enough :-( Quite of a coincidence, as I was looking at the article topic, and it happened to have this remark embedded! >> 1. https://en.wikipedia.org/wiki/Ordinal_indicator >> 2. https://en.wikipedia.org/wiki/Ordinal_indicator#cite_note-1 From unicode at unicode.org Sun Dec 9 16:59:03 2018 From: unicode at unicode.org (Joe Becker via Unicode) Date: Sun, 9 Dec 2018 14:59:03 -0800 Subject: 50th Anniversary of the "Mother of all Demos" Message-ID: <5C0D9E37.6070106@unicode.org> ... "given at the Association for Computing Machinery / Institute of Electrical and Electronics Engineers (ACM/IEEE)?Computer Society's Fall Joint Computer Conference in San Francisco, which was presented by Douglas Engelbart [with Bill English at the Stanford Research Institute end in Menlo Park] on December 9, 1968." See e.g.: https://en.wikipedia.org/wiki/The_Mother_of_All_Demos https://web.stanford.edu/dept/SUL/library/extra4/sloan/mousesite/dce1968conferenceannouncement.jpg This event presented the framework for modern computing, including Unicode and much else. Joe From unicode at unicode.org Mon Dec 10 04:06:19 2018 From: unicode at unicode.org (Henri Sivonen via Unicode) Date: Mon, 10 Dec 2018 12:06:19 +0200 Subject: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences In-Reply-To: References: Message-ID: We're about to remove the U+FFFD generation for the case where there is no content between two ISO-2022-JP escape sequences from the WHATWG Encoding Standard. Is there anything wrong with my analysis that U+FFFD generation in that case is not a useful security measure when unnecessary transitions between the ASCII and Roman states do not generate U+FFFD? On Thu, Nov 22, 2018 at 1:08 PM Henri Sivonen wrote: > > Context: https://github.com/whatwg/encoding/issues/115 > > Unicode Security Considerations say: > "3.6.2 Some Output For All Input > > Character encoding conversion must also not simply skip an illegal > input byte sequence. Instead, it must stop with an error or substitute > a replacement character (such as U+FFFD ( ? ) REPLACEMENT CHARACTER) > or an escape sequence in the output. (See also Section 3.5 Deletion of > Code Points.) It is important to do this not only for byte sequences > that encode characters, but also for unrecognized or "empty" > state-change sequences. For example: > [...] > ISO-2022 shift sequences without text characters before the next shift > sequence. The formal syntaxes for HZ and most CJK ISO-2022 variants > require at least one character in a text segment between shift > sequences. Security software written to the formal specification may > not detect malicious text (for example, "delete" with a > shift-to-double-byte then an immediate shift-to-ASCII in the middle)." > (https://www.unicode.org/reports/tr36/#Some_Output_For_All_Input) > > The WHATWG Encoding Standard bakes this requirement by the means of > "ISO-2022-JP output flag" > (https://encoding.spec.whatwg.org/#iso-2022-jp-output-flag) into its > ISO-2022-JP decoder algorithm > (https://encoding.spec.whatwg.org/#iso-2022-jp-decoder). > > encoding_rs (https://github.com/hsivonen/encoding_rs) implements the > WHATWG spec. > > After Gecko switched to encoding_rs from an implementation that didn't > implement this U+FFFD generation behavior (uconv), a bug has been > logged in the context of decoding Japanese email in Thunderbird: > https://bugzilla.mozilla.org/show_bug.cgi?id=1508136 > > Ken Lunde also recalls seeing such email: > https://github.com/whatwg/encoding/issues/115#issuecomment-440661403 > > The root problem seems to be that the requirement gives ISO-2022-JP > the unusual and surprising property that concatenating two ISO-2022-JP > outputs from a conforming encoder can result in a byte sequence that > is non-conforming as input to a ISO-2022-JP decoder. > > Microsoft Edge and IE don't generate U+FFFD when an ISO-2022-JP escape > sequence is immediately followed by another ISO-2022-JP escape > sequence. Chrome and Safari do, but their implementations of > ISO-2022-JP aren't independent of each other. Moreover, Chrome's > decoder implementations generally are informed by the Encoding > Standard (though the ISO-2022-JP decoder specifically might not be > yet), and I suspect that Safari's implementation (ICU) is either > informed by Unicode Security Considerations or vice versa. > > The example given as rationale in Unicode Security Considerations, > obfuscating the ASCII string "delete", could be accomplished by > alternating between the ASCII and Roman states to that every other > character is in the ASCII state and the rest of the Roman state. > > Is the requirement to generate U+FFFD when there is no content between > ISO-2022-JP escape sequences useful if useless ASCII-to-ASCII > transitions or useless transitions between ASCII and Roman are not > also required to generate U+FFFD? Would it even be feasible (in terms > of interop with legacy encoders) to make useless transitions between > ASCII and Roman generate U+FFFD? > > -- > Henri Sivonen > hsivonen at hsivonen.fi > https://hsivonen.fi/ -- Henri Sivonen hsivonen at hsivonen.fi https://hsivonen.fi/ From unicode at unicode.org Mon Dec 10 05:14:55 2018 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Mon, 10 Dec 2018 12:14:55 +0100 Subject: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences In-Reply-To: References: Message-ID: I tend to agree with your analysis that emitting U+FFFD when there is no content between escapes in "shifting" encodings like ISO-2022-JP is unnecessary, and for consistency between implementations should not be recommended. Can you file this at http://www.unicode.org/reporting.html so that the committee can look at your proposal with an eye to changing http://www.unicode.org/reports/tr36/? Mark On Mon, Dec 10, 2018 at 11:10 AM Henri Sivonen via Unicode < unicode at unicode.org> wrote: > We're about to remove the U+FFFD generation for the case where there > is no content between two ISO-2022-JP escape sequences from the WHATWG > Encoding Standard. > > Is there anything wrong with my analysis that U+FFFD generation in > that case is not a useful security measure when unnecessary > transitions between the ASCII and Roman states do not generate U+FFFD? > > On Thu, Nov 22, 2018 at 1:08 PM Henri Sivonen > wrote: > > > > Context: https://github.com/whatwg/encoding/issues/115 > > > > Unicode Security Considerations say: > > "3.6.2 Some Output For All Input > > > > Character encoding conversion must also not simply skip an illegal > > input byte sequence. Instead, it must stop with an error or substitute > > a replacement character (such as U+FFFD ( ? ) REPLACEMENT CHARACTER) > > or an escape sequence in the output. (See also Section 3.5 Deletion of > > Code Points.) It is important to do this not only for byte sequences > > that encode characters, but also for unrecognized or "empty" > > state-change sequences. For example: > > [...] > > ISO-2022 shift sequences without text characters before the next shift > > sequence. The formal syntaxes for HZ and most CJK ISO-2022 variants > > require at least one character in a text segment between shift > > sequences. Security software written to the formal specification may > > not detect malicious text (for example, "delete" with a > > shift-to-double-byte then an immediate shift-to-ASCII in the middle)." > > (https://www.unicode.org/reports/tr36/#Some_Output_For_All_Input) > > > > The WHATWG Encoding Standard bakes this requirement by the means of > > "ISO-2022-JP output flag" > > (https://encoding.spec.whatwg.org/#iso-2022-jp-output-flag) into its > > ISO-2022-JP decoder algorithm > > (https://encoding.spec.whatwg.org/#iso-2022-jp-decoder). > > > > encoding_rs (https://github.com/hsivonen/encoding_rs) implements the > > WHATWG spec. > > > > After Gecko switched to encoding_rs from an implementation that didn't > > implement this U+FFFD generation behavior (uconv), a bug has been > > logged in the context of decoding Japanese email in Thunderbird: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1508136 > > > > Ken Lunde also recalls seeing such email: > > https://github.com/whatwg/encoding/issues/115#issuecomment-440661403 > > > > The root problem seems to be that the requirement gives ISO-2022-JP > > the unusual and surprising property that concatenating two ISO-2022-JP > > outputs from a conforming encoder can result in a byte sequence that > > is non-conforming as input to a ISO-2022-JP decoder. > > > > Microsoft Edge and IE don't generate U+FFFD when an ISO-2022-JP escape > > sequence is immediately followed by another ISO-2022-JP escape > > sequence. Chrome and Safari do, but their implementations of > > ISO-2022-JP aren't independent of each other. Moreover, Chrome's > > decoder implementations generally are informed by the Encoding > > Standard (though the ISO-2022-JP decoder specifically might not be > > yet), and I suspect that Safari's implementation (ICU) is either > > informed by Unicode Security Considerations or vice versa. > > > > The example given as rationale in Unicode Security Considerations, > > obfuscating the ASCII string "delete", could be accomplished by > > alternating between the ASCII and Roman states to that every other > > character is in the ASCII state and the rest of the Roman state. > > > > Is the requirement to generate U+FFFD when there is no content between > > ISO-2022-JP escape sequences useful if useless ASCII-to-ASCII > > transitions or useless transitions between ASCII and Roman are not > > also required to generate U+FFFD? Would it even be feasible (in terms > > of interop with legacy encoders) to make useless transitions between > > ASCII and Roman generate U+FFFD? > > > > -- > > Henri Sivonen > > hsivonen at hsivonen.fi > > https://hsivonen.fi/ > > > > -- > Henri Sivonen > hsivonen at hsivonen.fi > https://hsivonen.fi/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Dec 10 14:12:50 2018 From: unicode at unicode.org (Shawn Steele via Unicode) Date: Mon, 10 Dec 2018 20:12:50 +0000 Subject: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences In-Reply-To: References: Message-ID: IMO, trying to do security checks on an encoded string that will be decoded later is pretty much guaranteed to miss cases. Particularly with ISO-2022-JP, which has a plethora of variations in how different software/libraries/OS's decode it and treat the invalid/edge cases. I typically encourage security checks on encodings to be done after the translation to Unicode has been done, but that only works if that is the Unicode stream itself is being checked. Eg: a firewall may not decode it the same way as the end-recipient of the data. Which I guess is the point of the encoding project, but... nobody can't guarantee that an endpoint conforms to any "standard", so from a security perspective, the recommended guidance is pretty much moot, secure applications have to consider non-conforming behavior of endpoints as well. Providing a "best practice" or suggestions in a standard is nice, but in practice systems are going to have differing interpretations and behaviors. Applications can't "depend" on any consistency. Even if all the standard documents agreed, there'd still be legacy implementations that people didn't update for whatever reason and other implementations would miss some of the subtleties (or less subtle differences) of the standards. IMO, all of the "state shifting" encodings should be treated with care by software. There're a lot of ways to encode the same or similar strings in different ways, and you never know what kind of validation happened "on the other end". It's pretty much a given that ISO-2022-JP, particularly edge cases, are going to be interpreted differently by different applications. -Shawn From unicode at unicode.org Fri Dec 14 16:48:47 2018 From: unicode at unicode.org (Craig, David O via Unicode) Date: Fri, 14 Dec 2018 22:48:47 +0000 Subject: Japan may not announce new era name until April 11 Message-ID: Note the recent article in the japan times: https://www.japantimes.co.jp/news/2018/12/06/national/politics-diplomacy/japan-mulls-announcing-new-era-name-april-11-sources/#.XBQPC6qWxD8 April 11 leaves less than three weeks before the May 1 ascension. David Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Dec 14 17:24:47 2018 From: unicode at unicode.org (Phake Nick via Unicode) Date: Sat, 15 Dec 2018 07:24:47 +0800 Subject: Japan may not announce new era name until April 11 In-Reply-To: References: Message-ID: "Until April 11 or later". As in after certain commemoration ceremony that will take place on April 10. According to report, the reduction in notification period is meant to be a concession to conservative legislators within the ruling party, that they don't want to have such prior announcement at all. 2018-12-15 06:51, Craig, David O via Unicode wrote: > Note the recent article in the japan times: > > > > > https://www.japantimes.co.jp/news/2018/12/06/national/politics-diplomacy/japan-mulls-announcing-new-era-name-april-11-sources/#.XBQPC6qWxD8 > > > > April 11 leaves less than three weeks before the May 1 ascension. > > > > > > > > David Craig > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: