Teletext separated mosaic graphics

Tue Oct 6 18:11:56 CDT 2020

> 5 okt. 2020 kl. 02:07 skrev Doug Ewell via Unicode <unicode at unicode.org>:
> 
> Kent Karlsson wrote:
> 
>>>> See for example the definitions for SPL and STL here:
>>>> https://www.itscj.ipsj.or.jp/iso-ir/056.pdf (that document details
>>>> the C1 control codes for Data Syntax 2 Serial Videotex—which would
>>>> seem to be the Teletext set but as a C1 set, and as such with CSI
>>>> rather than ESC).
>>> 
>>> Applications of any sort that are compliant with ISO/IEC 6429
>>> (ECMA-48, ANSI X3.64) should understand ESC [ as a synonym for CSI.
>> 
>> Teletext is not compliant with ECMA-48 (unless converted).
> 
> You're right, and I had sort of said that farther down. I didn't read the definitions or Harriet's synopsis carefully enough, and misinterpreted the reference to “CSI rather than ESC.”
> 
> The UK Videotex

And I’m talking about the current ETSI EN 300 706 V1.2.1 (2003-04), Enhanced Teletext specification, https://www.etsi.org/deliver/etsi_en/300700_300799/300706/01.02.01_60/en_300706v010201p.pdf. That seems to be the latest version, and is, AFAICT, implemented in all(?) TV sets and ”TV boxes”, sold, I would think, worldwide.

I also just found "Digital Video Broadcasting (DVB); Specification for conveying ITU-R System B Teletext in DVB bitstreams” (ETSI EN 300 472 V1.4.1 (2017-04), https://www.etsi.org/deliver/etsi_en/300400_300499/300472/01.04.01_60/en_300472v010401p.pdf). (I haven’t scanned through it  yet.)

> control codes are single bytes in the ECMA-35 C1 space, and can be adapted for 7-bit systems to ESC plus a corresponding value in the G0 space; but that does not make the system compliant with ECMA-48, and indeed it is not.
> 
>>> - "contiguous graphics" becomes U+0019
>>> - "separated graphics" becomes U+001A
>>> - "double height" becomes U+000D
>>> - "end box" becomes U+000A
>> 
>> That would be an extremely bad idea (as well as being completely non-
>> compliant with ECMA-48, if that is still the approach, as I think it
>> should be).
> 
> As you just said, correctly, teletext is not compliant with ECMA-48.
> 
> UTC has confirmed it will not add more control characters for backward compatibility purposes like this.

And these controls are not good anyway… They do three things in one go (i.e. per ”control” code):
1. Change charset (most of them)
2. Change color (most of them)
3. Display as a SPACE (or as a ”mosaic character”, if ”hold mosaics” is active)

I wouldn’t even think of proposing, or even perpetuating, this kind of thing. They are horrendous! In addition,
all of them can be overridden by formatting (and character substitutions) in control ”objects" given in the Teletext protocol.
In addition, the Teletext protocol allows for ”user defined” fonts (called DRCS in the Teletext specification). Converting those (and their use) is a different headache...

> (I don't think there is a promise not to encode more completely novel control characters, such as for hieroglyphics, but that is not the question here.)
> 
> We all know there is no such thing in Unicode as a "hybrid" character that is sometimes a control character and sometimes a graphic character in normal use. We know that Unicode has defined fixed meanings for a subset of the C0 control characters, including CR and LF. But a teletext application for a modern computer is not "normal use.”

Sure it is. Teletext pages are already displayed in HTML pages (and they don't convert the Teletext pages to images before display; they could, but is not necessary, and don't). Teletext pages are also displayed in mobile phone (tablet) apps.

Try out the web site ”texttv.nu” (also available as an iOS app under the same name); it displays the current(!!!) Teletext pages from SVT (it may have some minutes of delay, if there is a change, and the app does notify of changes). Perfectly normal web pages (with text, not images), perfectly normal mobile app. There are several other web pages and apps that do similar display of Teletext pages, also for other TV channels. (I listed a few more in another email a few months ago.) (SVT do their own web pages for their Teletext content, but those pages are less faithful to the TV rendering: https://www.svt.se/svttext/webu/pages/100.html.)

> It is reasonable for a non-standard application like this to interpret characters from U+0000 to U+001F as the corresponding ISO 646 characters would be in teletext. It is, frankly, the only choice.

Quite the contrary, that is a definite NON-option. All of these Teletext ”controls” can be converted to HTML/CSS (including charset switching before conversion to Unicode and including styling and character ”object overrides” in Teletext. They include underline, bold, italics, proportional spacing, more colors and character replacements [the latter would be part of character conversion, not part of styling]. It is not that hard to figure out extensions to ECMA-48 to cover also the more odd bits (except ”user defined" fonts), like ”boxing”. What is missing (currently) is the ”separated mosaics graphic” characters…

> 
>> I don’t know how Teletext is represented in DVB or IP-TV; but those
>> digital representations of TV images do not use traditional ”analog”
>> representation of TV images, and hence cannot have the ”analog”
>> representation of ”rows” (lines) of text in Teletext. (And yes,
>> Teletext does work fine with IP-TV.)
> 
> Rows in teletext are defined in a completely different way from the now-standard model of a continuous stream of characters that are delimited by a sequence of one or more "end-of-line" control characters. The teletext row model is more akin to the fixed-length model from the punch-card and tape era.

Yes, but that does in no way prevent conversion to using ”normal” line breaking characters instead of the ”row” concept.

>> Note also that Teletext is rife with ”code page switching”. ESC
>> toggles between a primary and a secondary charset (for text). In a
>> control part of the Teletext protocol one sets the charsets for text
>> (options include various ”national variants” of ISO/IEC 646, as well
>> as Greek, Hebrew and Arabic (visual order, preshaped).
> 
> A teletext application would probably be expected to implement that as well.

Yes, one that is ”general purpose”. (I cannot vouch for that current converters to HTML are that complete.)

> 
>> Toggling between separated and contiguous ”mosaics” is also best seen
>> as a switch between charsets.
> 
> Which is why we did not propose the separated mosaics in Round 1, and Script Ad-Hoc and UTC agreed.

?? That seems to contradict what I said.

> 
>> Regarding it as a styling is odd, since this particular styling would
>> only apply to a few very rarely used characters, and the change is not
>> one that is recognized as styling elsewhere. In addition, you have
>> already encoded separated and contiguous other but similar ”mosaics”
>> characters as separate characters.
> 
> We tried to be as consistent as possible with the Legacy Symbols proposal,

Teletext is not legacy (yet).

> and to propose things separately only where some legacy platform encoded them separately, not just with a mode shift or by masking the code point with 0x80.

????

> There may be imperfections in the model, based on what SAH did and did not approve.

????

> 
>> Even the colour controls in Teletext switch between text and mosaics
>> (and in addition are usually displayed as a space, as is the norm in
>> Teletext for ”control” characters).
> 
> That is certainly behavior that a teletext application should emulate.

Part of character encoding conversion, not of styling.

> 
>> Part of the Teletext protocol specifies how to set/unset bold/italic/
>> underline. But that is not inline in the text, it is ”out-of-line”
>> elsewhere in the protocol (in a control part). But colouring, certain
>> sizing, blink, conceal, and ”boxing” (used for (optional) subtitling
>> and news flash messages) are inline. Note that Teletext is still often
>> used for subtitling.
> 
> Another reason why it is probably not appropriate to try to represent teletext in a plain-text file.

You will need the styling, either as HTML/CSS (as is already done, though the conversion might not be complete), or using an extension of ECMA-48 for that. But there is no reason to perpetuate the arcane ”Teletext controls” and (also arcane) ”Teletext objects”. Otherwise it is perfectly reasonable to represent Teletext pages as HTML/CSS files (and that is done already, often including a navigation section to navigate more comfortably between pages, and converting triple-digits to links to other pages), or as (extended) ECMA-48 files. Perfectly normal files, with linefeed or HTML markup for representing lines/”rows”.

> You can certainly convert it to a plain-text file, with ECMA-48 sequences for styling and lines ending in CR and/or LF, but then it is no longer "teletext data" but a conversion. 

So? If you convert Teletext text (skipping over styling and such for the moment) to Unicode, it is no longer Teletext, since Teletext has nothing in Unicode… But you do want certain characters in Unicode just for the purpose of such a conversion…

I think one needs to distinguish between the Teletext protocol (the synch scan line representation is already obsolete; but Teletext does still exist in DVB and IP-TV; the low level representation there I do not know, but see reference above to an ETSI standard about just that) and Teletext pages (the content). Teletext content is still being produced and presented via DVB/IP-TV as well as web pages and apps. The latter two obviously do not use the Teletext protocol; I don’t know how, and in what format, they get the base page data from the TV channels.

> 
>> Most of Teletext styling can be converted to ECMA-48 styling as is.
>> Some others will need an extension of ECMA-48 to be representable in
>> that framework.
> 
> I read with interest your proposal last year to update ECMA-48.

> I think the proposed extensions and clarifications had a better chance of adoption than the suggestions to change existing functionality outright.

Some things have just diverged for absolutely no benefit. Some other things have been outright wrong in some implementations, and cannot be carried forward.

> I am curious about the current status of that proposal; was it submitted anywhere?

I’m still editing it; the very last changes (I have to stop tinkering…).

I hope it will be a UTN, I have proposed it as such. I think it would fit very well as a UTN. "Control functions”, whether as singular codes or as escape sequences or as control sequences, has traditionally been seen as in the character encoding realm, and my proposal has several suggestion pertaining directly to Unicode in a ECMA-48 control sequence context. I’m not proposing that Unicode TC take over ECMA-48, but I have no hope of ”reviving” in some way an ECMA-48 committee. But ECMA-48 control sequences are still very much part of our ”digital text ecosystem”, even though it is currently used almost exclusively in terminal emulators. HTML/CSS is not at all all-encompassing. So I think ECMA-48 needs an update for Unicode, as well as for other functionality.

/Kent K

> 
> --
> Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201007/39348be9/attachment.htm>