Surrogates and noncharacters

Philippe Verdy verdy_p at
Tue May 12 11:44:15 CDT 2015

Even if UTF-8 initially started as part of some Unix standardization
process, it was for the prupose of allowing interchanges across systems.
The networking concept was already there (otherwise it would not have been
part of the emerging *nix standardization processes, and would have
remained a proprietary encoding in local systems).

At the same time, The Internet was also about to emerge as a worldwide
network, but Internet was still very limited and full of restrictions,
accessible only from a few (very costly) gateways in other countries, and
not even with the IP protocol but with many specific protocols (may be you
remember the time of CompuServe, billed only in US dollars and only via
international payments and costly bank processing fees; you also had to
call an international phone number before a few national phone numbers
appeared, cooperated by CompuServe and some national or regional services

At that time, the Telcos were not even interested to participate and all
wanted to develop their own national or regional networks with their own
protocols and "national" standards; real competition in telecommunications
only started just before Y2K, with the deregulation in North America and
some parts of Europe, in fact just in the EEA, before progressively going
worldwide when the initial competitors started to restructure/split/merge
and aligning their too many technical standards with the need of a common
interoperable one that would worlk in all their new local branches). In
fact the worldwide Internet would not have become THE global network
without the reorganisation of older dereregulated national telcos and the
end of their monopoles.

The development of "the" Internet, and the development of the UCS, were
then completely made in parallel. Both were appearing to replace former
national standards in the same domains previously operated by the former
monopoles in telecommunications (and that also needed computing and data
standards, not just networking standards).

In the early time of Internet, the IP protocol was still not really adapted
as the universal internetworking protocol (other competitors were also
proposed by private companies, notably Token-Ring by IBM, and the X21-X25
family promoted essentially by European telcos (which prefered realtime
protocols with warrantied/reserved bandwidth, and commutation by packets
instead of by frames of variable sizes).

Even today, there are some remaining parts of the X* network family, but
only for short-distance private links: e.g. with ATM (in xDSL
technologies), or for local buses within electronic devices (under the 1
meter limit), or within some critical missions (realtime constraints used
for networking equipements in aircrafts, that have their own standard, wit
ha few of them developped recently as adaptation of Internet technologies
over channels in a realtime network, generally not structured in a "mesh"
but with a "star" topology and dedicated bandwidths).

If you want to look for remaining text encoding standards that are still
not based on the UCS, look into aircraft technologies, and military
equipements (there's also the GSM family of protocols, which continues to
keep many legacy proprietary standards, with poor adaptation to Internet
technologies and the UCS...)

The situation is starting to change now in aircraft/military technology too
(first Airbus in Europe, now also adopted by its major US competitors) and
mobile networks (4G), with the full integration of the the IEEE Ethernet
standard, that allows a more natural and straightforward integration of IP
protocols and the UCS standards with it (even if compatibility is kept by
reserving a space for former protocols, something that the IEEE Ethernet
standard has already facilitated for the Internet we know now, both in
worldwide communications, and in private LANs)...

2015-05-12 17:58 GMT+02:00 Hans Aberg <haberg-1 at>:

> > On 12 May 2015, at 16:50, Philippe Verdy <verdy_p at> wrote:
> >
> >> Indeed, that is why UTF-8 was invented for use in Unix-like
> environments.
> >>
> > Not the main reason: communication protocols, and data storage is also
> based on 8-bit code units (even if storage group them by much larger
> blocks).
> There is some history here:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list