Re: “plain text styling”…

Harriet Riddle harjitmoe at outlook.com
Thu Jan 12 13:05:36 CST 2023


Kent Karlsson via Unicode wrote:
> I did. It is misnamed there and for Teletext. And yes, I know there were other ESC sequence definitions before ECMA-48, which still were ESC sequences, not ”jumping” to another codepage.


EBCDIC's single-shift is called GE (graphic escape).  Where EBCDIC's SI 
and SO are mostly used for switching between single-byte and double-byte 
pages in a CJK encoding, GE seems to have been used for accessing a 
single-byte page of extended symbols (such as code page 310 for APL) 
while using a more conventional EBCDIC page as the main set.  Arguably, 
the GSM escape is a graphic escape (GE).

 From an ECMA-35 perspective, it doesn't really matter if 0x1B in 
Teletext and GSM is (a) ESC with a different behaviour to that specified 
in ECMA-35 or (b) something other than ESC.  Since ECMA-35 explicitly 
reserves 0x1B for ESC and forbids C0 sets from redefining it, and also 
defines the behaviour of ESC including the general structure of ESC 
sequences (which ECMA-48 conforms to), either is equally 
non-conformant.  In the case of GSM, it is further non-conformant by 
encoding glyphs over the CL area, which is reserved for C0 controls.

---

> That’s what I said (though I said SMS and cell broadcast 7-bit charsets; GSM (2G) is somewhat outdated, we're (mostly) on 4G and 5G now).


And yet, when I open my (Android 6.0) SMS app, with an active 4G 
connection, in the UK, and type a ' (ASCII apostrophe) character, it 
reports I have 159 characters remaining until it has to send a 
multi-part SMS.  When I delete that character and type a ~ (tilde) 
instead, it reports only 158 characters remaining.  When I delete that 
and type a ` (backtick), it reports only 69 characters remaining.  And 
as one might have guessed, if I delete that and paste in a 𐐔, it 
reports 68 characters remaining.

The amount of text that fits in 1120 bits under either GSM 7-bit (if 
within its repertoire) or UTF-16 (otherwise) is still a relevant metric, 
it seems.

--Har.


More information about the Unicode mailing list