Re: “plain text styling”…

Kent Karlsson kent.b.karlsson at bahnhof.se
Thu Jan 12 19:01:01 CST 2023


This is getting too off-topic. But just two small remarks. (After this I will not comment more on SMS stuff in this thread.)

> 12 jan. 2023 kl. 20:05 skrev Harriet Riddle via Unicode <unicode at corp.unicode.org>:
>
> From an ECMA-35 perspective, it doesn't really matter if 0x1B in Teletext and GSM is (a) ESC with a different behaviour to that specified in ECMA-35 or (b) something other than ESC.  Since ECMA-35 explicitly reserves 0x1B for ESC and forbids C0 sets from redefining it, and also defines the behaviour of ESC including the general structure of ESC sequences (which ECMA-48 conforms to), either is equally non-conformant.  In the case of GSM, it is further non-conformant by encoding glyphs over the CL area, which is reserved for C0 controls.

There is no notion of C0, G0, etc. in these 7-bit charsets. But the 7-bit charsets do have a ”secondary codepage” (by another name) and are prepared for having a ”tertiary codepage” (but that is not (yet) used).

> ---
> 
>> That’s what I said (though I said SMS and cell broadcast 7-bit charsets; GSM (2G) is somewhat outdated, we're (mostly) on 4G and 5G now).
> 
> 
> And yet, when I open my (Android 6.0) SMS app, with an active 4G connection, in the UK, and type a ' (ASCII apostrophe) character, it reports I have 159 characters remaining until it has to send a multi-part SMS.  When I delete that character and type a ~ (tilde) instead, it reports only 158 characters remaining.  When I delete that and type a ` (backtick), it reports only 69 characters remaining.  And as one might have guessed, if I delete that and paste in a 𐐔, it reports 68 characters remaining.
> 
> The amount of text that fits in 1120 bits under either GSM 7-bit (if within its repertoire) or UTF-16 (otherwise) is still a relevant metric, it seems.

Backwards compatibility is a big issue here of course. If no new-fangled extension is used, everything should work as before also for ”old” user equipment (usually mobile phones). Both w.r.t. the charsets, but also w.r.t. the protocol itself. If something new-fangled is used, ”old” equipment may display ”mojibake".

And, if the text cannot be represented in (one of, there are now several) the 7-charsets, a switch to ”USC-2” (3GPP still does not call it ”UTF-16BE”…) can be done (though the 3GPP standards do not require that, it is application defined).

/Kent K

> --Har.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230113/f6f8f503/attachment.htm>


More information about the Unicode mailing list