Tag characters and in-line graphics (from Tag characters)

Philippe Verdy verdy_p at wanadoo.fr
Wed Jun 3 20:27:27 CDT 2015

2015-06-04 2:59 GMT+02:00 David Starner <prosfilaes at gmail.com>:

> You can’t iterate over compressed bits. You can’t process them.
> Why not? In any language I know of that has iterators, there would be no
> problem writing one that iterates over compressed input. If you need to
> mutate them, that is hard in compressed formats, but a new CPU can store
> War in Peace in the on-CPU cache.

You're right, today the CPU is no longer the bottleneck, which is now
* the speed of long buses and communcaition links, with their limited (and
costly) bandwidth as this is a shared medium used by more and more people
but requiring mssive infrastures, or physical constraints even on the
fastest serial buses, both implying transmission roundtrip times (limiting
random access, which is a severe problem now that we have to access to
extremely large volumes of data distributed over multiple devices or over a
full network
* the storage capacity for the fastest storage medium (such as flash
memory, which is the only option for mobile devices, but also the most
In both cases you need compression (the second bottleneck on storage
volumes will fade out in a few years, but not the bandwidth constraints).
It really pays now to use compression schemes (even the most complex ones
such as those used to transmit live video: locally a CPU or GPU will easily
handle the compression scheme.

Researches on compression schemes is really not ended, it has never been so
much active as it is today, including for text because of the explosion of
the data volumes, even if now the volume of text is largely overwhelmed by
the volume of images, videos and audio (but you can't compute a lot of
things from audio/image/video data sources, we still need text for giving
semantics to these medias from which you can derive data or perform
searches (there is still a lot to do for handling images and audio speech
and detect some semantics in them, but you won't get as much info from an
audio/video than what can be represented by text: OCR for example is a very
heuristic process with lots of false guesses produced, still much more than
humain brains can process within a broad ranges of variations that we call
"cultures"; computers are still very poor in recognizing cultures with as
many variations as those we recognize through social interactions and years
of education and *personal* experience).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150604/087552ca/attachment.html>

More information about the Unicode mailing list