A teletext control codes encoding suggestion

Kent Karlsson kent.b.karlsson at bahnhof.se
Tue Jan 4 17:59:53 CST 2022


(A bit long reply, sorry for that.)

> 4 jan. 2022 kl. 23:12 skrev Sławomir Osipiuk via Unicode <unicode at corp.unicode.org>:
> 
> But what if I want to display a literal tilde followed by capital A and G in my teletext page without changing colour? This proposal has no method to escape the "controls".
> 
> If the idea is just "use existing Unicode characters in special combinations to encode functionality" (and a lot of Mr. Overington's ideas seem to revolve around this) then why not use one of the innumerable EXISTING standards which use special combinations of characters? I seem to recall one that uses greater-than and less-than signs to surround key-value pairs, something like <font color="green”>.

HTML/CSS is possible. But a bit heavy-handed. Not everything has to be HTML (or XML), however popular it is.

HTML has been and is used for presenting Teletext pages on the web. Some used to use styled text, but now generating an image seems to have taken over, and that image is embedded in HTML… Quite heavy-handed (except for browsers), and not searchable or indexable unless the text in the image is also given as text (which does not seem to be the case). And certainly not suitable for archiving Teletext pages.

> Or if that's too space-inefficient, Harriet Riddle's idea of using the C1 control space along with the proper ISO-2022 designation is about as compact as it gets, and stays closest to the spirit of control characters.

That is out of the question, I’d say, and a very bad idea, much worse than what William has proposed (there has been several proposals).

However, CSI 32m is the control sequence for foreground ”green” in ECMA-48 (ISO/IEC 6429) encoding. By popular extension, CSI 92m is the control sequence for foreground ”bright green”. This is implemented in just about all terminal emulators, but ECMA-48 is not at all limited to terminal emulators. See https://en.wikipedia.org/wiki/ANSI_escape_code.

So there is already a good candidate ”styling” (and other things) standard for handling this, and its SGR (Set Graphics Rendition) control sequences already cover a lot of the Teletext styling (functionality-wise), though some additions needed to cover all.

> There's no compelling reason for yet another markup. "Reuse" is not a dirty concept.

I agree. But one does need a few extensions to ECMA-48 styling to cover the functionality of Teletext styling. Most importantly for Box start and Box end (which are used for subtitling). That can be done by extending SGR (in ECMA-48) to handle also Box start/end, plus a few other extensions to cover all of Teletext styling. (Yes, I have looked into this in detail.)

One also needs to separate the functionalities of the Teletext ”control” codes. For most of them, they each serve three functions:

1) A graphic character, usually SPACE (but sometimes a ”mosaic” character); rendered before or after the colour (or other) change, as specified in the Teletext standard. That must be handled by the converter (to Unicode plus ECMA-48 (ISO/IEC 6429)).

2) Code page switching specific to Teletext, and need be converted away.

3) Most of them change foreground colour (and one changes background colour); the ECMA-48 control sequences for colour change are non-spacing (cmp. point 1).

In addition, as I have mentioned multiple times, the Teletext protocol allows for more colours and styling to underline, bold, italic or proportional font ”out-of-band” of the text (there is a start index when giving such attributes). No surprise, ECMA-48 can support all of that (in-band), and several of them (except proportional font) commonly implemented in terminal emulators (but ECMA-48, ISO/IEC 6429, itself is not limited to terminal emulators).

Teletext has never embraced any part of ECMA-48, which is a little bit of a pity, and instead goes through hoops to be fully backwards compatible with the original Teletext design, which is highly old-fashioned, in particular the codepage setup, clashing royally with such things as Unicode/10646, or indeed with anything that is not 7-bit with special code pages. On the other hand, ECMA-48 works well also with Unicode. And… may also be suitable for archiving of Teletext pages, if one wants to get out of the 7-bit codepage switching approach (which indeed makes searches and indexing hard).

/Kent K

> Sławomir Osipiuk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220105/77e3c3b3/attachment.htm>


More information about the Unicode mailing list