Re: “plain text styling”…
Harriet Riddle
harjitmoe at outlook.com
Thu Jan 12 12:44:42 CST 2023
Kent Karlsson via Unicode wrote:
> And SS1 would be a no-op? A bit like NULL was intended to be…
>
> SS2 is ”jump to secondary codepage”, SS3 is ”jump to tertiary codepage”.
>
> /Kent K
---
Not quite. ECMA-48 is written strictly within the confines of ECMA-35,
and the semantics of SS2 and SS3 are described in more detail there.
But in brief: it specifies that the next code, encoded using either the
0x20–7F range or the 0xA0–FF range (confusingly, it specifies this
twice, once for 7-bit encodings (with only the 0x20–7F range) and again
for 8-bit encodings (permitting either but not a mixture), but I
digress), is sourced from the set designated as the G2 set. This is as
opposed to the G0 (Shift In), G1 (Shift Out) or G3 (SS3) sets. So SS1,
in the context of ECMA-35, would be a non-locking shift to the same
codepage that Shift Out / SO is a locking shift to. Although even SS0
wouldn't be a no-op, since it would allow accessing characters in the G0
set without leaving Shift Out state.
(For reasons I can't quite fathom, both ECMA-35 and ECMA-48 make a punt
at pretending that LS1 and LS0 (Locking Shifts 1 and 0) in an 8-bit code
aren't the same thing as SO and SI in a 7-bit code, even though they do
exactly the same thing and are coded at the same positions. While I
cannot read the committee's mind on that, my only guess is that it's
either to better correlate them with LS1R, to clarify that they only
operate on ASCII bytes rather than all non‑control bytes (as opposed to
in EBCDIC, where SO and SI operate on the entire 0x41–FE range), or both.)
I suppose you could think of G2 (the SS2 set) as the "second
supplementary codepage", where the Shift Out (G1) set is the "first
supplementary codepage". Treating the G0 set as a "primary codepage"
versus the three "supplementary codepages" is not /explicitly/ done by
ECMA-35, but the G0 set is effectively treated differently than the
other three, since it cannot include 0x20 or 0x7F, and cannot be
shift-invoked (locking or otherwise) over 0xA0–FF, while the other three
can be 96-code sets and can be shift-invoked over either range.
As a sidenote: nominally (in accordance with their derivation from
ECMA-43), ISO 8859 encodings have ASCII in G0 and their supplements in
G1. In practice, this might not be the case in a Unix terminal
contexts, since software may expect Shift Out to switch to DEC Special
Graphics ("DECgraphics"), which can be worked around by including the
supplement in G2 instead, and invoking G2 over 0xA0–FF (i.e. in LS2R
state rather than LS1R state).
--Har.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230112/24e7cbaf/attachment.htm>
More information about the Unicode
mailing list