Teletext separated mosaic graphics

Doug Ewell doug at
Sun Oct 4 19:07:26 CDT 2020

Kent Karlsson wrote:

>>> See for example the definitions for SPL and STL here:
>>> (that document details
>>> the C1 control codes for Data Syntax 2 Serial Videotex—which would
>>> seem to be the Teletext set but as a C1 set, and as such with CSI
>>> rather than ESC).
>> Applications of any sort that are compliant with ISO/IEC 6429
>> (ECMA-48, ANSI X3.64) should understand ESC [ as a synonym for CSI.
> Teletext is not compliant with ECMA-48 (unless converted).

You're right, and I had sort of said that farther down. I didn't read the definitions or Harriet's synopsis carefully enough, and misinterpreted the reference to “CSI rather than ESC.”

The UK Videotex control codes are single bytes in the ECMA-35 C1 space, and can be adapted for 7-bit systems to ESC plus a corresponding value in the G0 space; but that does not make the system compliant with ECMA-48, and indeed it is not.

>> - "contiguous graphics" becomes U+0019
>> - "separated graphics" becomes U+001A
>> - "double height" becomes U+000D
>> - "end box" becomes U+000A
> That would be an extremely bad idea (as well as being completely non-
> compliant with ECMA-48, if that is still the approach, as I think it
> should be).

As you just said, correctly, teletext is not compliant with ECMA-48.

UTC has confirmed it will not add more control characters for backward compatibility purposes like this. (I don't think there is a promise not to encode more completely novel control characters, such as for hieroglyphics, but that is not the question here.)

We all know there is no such thing in Unicode as a "hybrid" character that is sometimes a control character and sometimes a graphic character in normal use. We know that Unicode has defined fixed meanings for a subset of the C0 control characters, including CR and LF. But a teletext application for a modern computer is not "normal use." It is reasonable for a non-standard application like this to interpret characters from U+0000 to U+001F as the corresponding ISO 646 characters would be in teletext. It is, frankly, the only choice.

> I don’t know how Teletext is represented in DVB or IP-TV; but those
> digital representations of TV images do not use traditional ”analog”
> representation of TV images, and hence cannot have the ”analog”
> representation of ”rows” (lines) of text in Teletext. (And yes,
> Teletext does work fine with IP-TV.)

Rows in teletext are defined in a completely different way from the now-standard model of a continuous stream of characters that are delimited by a sequence of one or more "end-of-line" control characters. The teletext row model is more akin to the fixed-length model from the punch-card and tape era.

> Note also that Teletext is rife with ”code page switching”. ESC
> toggles between a primary and a secondary charset (for text). In a
> control part of the Teletext protocol one sets the charsets for text
> (options include various ”national variants” of ISO/IEC 646, as well
> as Greek, Hebrew and Arabic (visual order, preshaped).

A teletext application would probably be expected to implement that as well.

> Toggling between separated and contiguous ”mosaics” is also best seen
> as a switch between charsets.

Which is why we did not propose the separated mosaics in Round 1, and Script Ad-Hoc and UTC agreed.

> Regarding it as a styling is odd, since this particular styling would
> only apply to a few very rarely used characters, and the change is not
> one that is recognized as styling elsewhere. In addition, you have
> already encoded separated and contiguous other but similar ”mosaics”
> characters as separate characters.

We tried to be as consistent as possible with the Legacy Symbols proposal, and to propose things separately only where some legacy platform encoded them separately, not just with a mode shift or by masking the code point with 0x80. There may be imperfections in the model, based on what SAH did and did not approve.

> Even the colour controls in Teletext switch between text and mosaics
> (and in addition are usually displayed as a space, as is the norm in
> Teletext for ”control” characters).

That is certainly behavior that a teletext application should emulate.

> Part of the Teletext protocol specifies how to set/unset bold/italic/
> underline. But that is not inline in the text, it is ”out-of-line”
> elsewhere in the protocol (in a control part). But colouring, certain
> sizing, blink, conceal, and ”boxing” (used for (optional) subtitling
> and news flash messages) are inline. Note that Teletext is still often
> used for subtitling.

Another reason why it is probably not appropriate to try to represent teletext in a plain-text file. You can certainly convert it to a plain-text file, with ECMA-48 sequences for styling and lines ending in CR and/or LF, but then it is no longer "teletext data" but a conversion. 

> Most of Teletext styling can be converted to ECMA-48 styling as is.
> Some others will need an extension of ECMA-48 to be representable in
> that framework.

I read with interest your proposal last year to update ECMA-48. I think the proposed extensions and clarifications had a better chance of adoption than the suggestions to change existing functionality outright. I am curious about the current status of that proposal; was it submitted anywhere?

Doug Ewell, CC, ALB | Thornton, CO, US |

More information about the Unicode mailing list