Re: “plain text styling”…
Harriet Riddle
harjitmoe at outlook.com
Thu Jan 12 13:05:36 CST 2023
Kent Karlsson via Unicode wrote:
> I did. It is misnamed there and for Teletext. And yes, I know there were other ESC sequence definitions before ECMA-48, which still were ESC sequences, not ”jumping” to another codepage.
EBCDIC's single-shift is called GE (graphic escape). Where EBCDIC's SI
and SO are mostly used for switching between single-byte and double-byte
pages in a CJK encoding, GE seems to have been used for accessing a
single-byte page of extended symbols (such as code page 310 for APL)
while using a more conventional EBCDIC page as the main set. Arguably,
the GSM escape is a graphic escape (GE).
From an ECMA-35 perspective, it doesn't really matter if 0x1B in
Teletext and GSM is (a) ESC with a different behaviour to that specified
in ECMA-35 or (b) something other than ESC. Since ECMA-35 explicitly
reserves 0x1B for ESC and forbids C0 sets from redefining it, and also
defines the behaviour of ESC including the general structure of ESC
sequences (which ECMA-48 conforms to), either is equally
non-conformant. In the case of GSM, it is further non-conformant by
encoding glyphs over the CL area, which is reserved for C0 controls.
---
> That’s what I said (though I said SMS and cell broadcast 7-bit charsets; GSM (2G) is somewhat outdated, we're (mostly) on 4G and 5G now).
And yet, when I open my (Android 6.0) SMS app, with an active 4G
connection, in the UK, and type a ' (ASCII apostrophe) character, it
reports I have 159 characters remaining until it has to send a
multi-part SMS. When I delete that character and type a ~ (tilde)
instead, it reports only 158 characters remaining. When I delete that
and type a ` (backtick), it reports only 69 characters remaining. And
as one might have guessed, if I delete that and paste in a 𐐔, it
reports 68 characters remaining.
The amount of text that fits in 1120 bits under either GSM 7-bit (if
within its repertoire) or UTF-16 (otherwise) is still a relevant metric,
it seems.
--Har.
More information about the Unicode
mailing list