Re: “plain text styling”…

Thu Jan 12 12:44:42 CST 2023

Kent Karlsson via Unicode wrote:
> And SS1 would be a no-op? A bit like NULL was intended to be…
>
> SS2 is ”jump to secondary codepage”, SS3 is ”jump to tertiary codepage”.
>
> /Kent K

---

Not quite.  ECMA-48 is written strictly within the confines of ECMA-35, 
and the semantics of SS2 and SS3 are described in more detail there.  
But in brief: it specifies that the next code, encoded using either the 
0x20–7F range or the 0xA0–FF range (confusingly, it specifies this 
twice, once for 7-bit encodings (with only the 0x20–7F range) and again 
for 8-bit encodings (permitting either but not a mixture), but I 
digress), is sourced from the set designated as the G2 set.  This is as 
opposed to the G0 (Shift In), G1 (Shift Out) or G3 (SS3) sets.  So SS1, 
in the context of ECMA-35, would be a non-locking shift to the same 
codepage that Shift Out / SO is a locking shift to.  Although even SS0 
wouldn't be a no-op, since it would allow accessing characters in the G0 
set without leaving Shift Out state.

(For reasons I can't quite fathom, both ECMA-35 and ECMA-48 make a punt 
at pretending that LS1 and LS0 (Locking Shifts 1 and 0) in an 8-bit code 
aren't the same thing as SO and SI in a 7-bit code, even though they do 
exactly the same thing and are coded at the same positions.  While I 
cannot read the committee's mind on that, my only guess is that it's 
either to better correlate them with LS1R, to clarify that they only 
operate on ASCII bytes rather than all non‑control bytes (as opposed to 
in EBCDIC, where SO and SI operate on the entire 0x41–FE range), or both.)

I suppose you could think of G2 (the SS2 set) as the "second 
supplementary codepage", where the Shift Out (G1) set is the "first 
supplementary codepage".  Treating the G0 set as a "primary codepage" 
versus the three "supplementary codepages" is not /explicitly/ done by 
ECMA-35, but the G0 set is effectively treated differently than the 
other three, since it cannot include 0x20 or 0x7F, and cannot be 
shift-invoked (locking or otherwise) over 0xA0–FF, while the other three 
can be 96-code sets and can be shift-invoked over either range.

As a sidenote: nominally (in accordance with their derivation from 
ECMA-43), ISO 8859 encodings have ASCII in G0 and their supplements in 
G1.  In practice, this might not be the case in a Unix terminal 
contexts, since software may expect Shift Out to switch to DEC Special 
Graphics ("DECgraphics"), which can be worked around by including the 
supplement in G2 instead, and invoking G2 over 0xA0–FF (i.e. in LS2R 
state rather than LS1R state).

--Har.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230112/24e7cbaf/attachment.htm>