Tag characters and in-line graphics (from Tag characters)

John idou747 at gmail.com
Thu May 28 21:37:25 CDT 2015

"Today the world goes very well with HTML(5) which is now the bext markup language for document (including for inserting embedded images that don’t require any external request”

If I had a large document that reused a particular character thousands of times, would this HTML markup require embedding that character thousands of times, or could I define the character once at the beginning of the sequence, and then refer back to it in a space efficient way?

Part of the reason at least of having any code system rather than just pixels and images is to efficiently and consistently encode data. Unicode has private use ranges of codes. I can see an argument that it would be desirable to be able to send someone text with private use ranges and have the header define some default renderings. I’m not sure that replacing a document of 100,000 characters with 100,000 embedded html5 <img tags is the same thing. It would be inefficient in space. Impossible to process (e.g. find all the instances of a particular character, or sequence), and so forth.

Given that its been agreed that private use ranges are a good thing, and given that we can agree that exchanging data is a good thing, maybe something should bring those two things together. Just a thought.


On Fri, May 29, 2015 at 9:45 AM, Mark E. Shoulson <mark at kli.org> wrote:

> As was pointed out to me, essentially what you are saying is you reject 
> my premise that one size does not fit all.  You would prefer 
> *everything* be in plain text, "so you wouldn't have to use other 
> formats for it."  You're essentially converting plain text into THE 
> format for everything.
> But it isn't suited for that.  If you really believe one size should fit 
> all in this way, I think the problem is that pretty much all of the rest 
> of the computer science community doesn't agree with you.  Sorry.
> ~mark
> On 05/28/2015 07:50 AM, William_J_G Overington wrote:
>> Responding to Mark E. Shoulson:
>> The big advantage of this new format is that the result is an unambiguous Unicode plain text file and could be placed within a file of plain text without having to make the whole document a markup file to some format. Plain text is the key advantage.
>> The following may be useful as a guide to the original problem that I am trying to solve.
>> http://www.unicode.org/reports/tr51/tr51-2.html#Longer_Term
>> I tried to apply the brilliant new "base character followed by tag characters" format to the problem.
>> In the future, maybe Serif DrawPlus will have the ability to export a picture to this new format.
>> William Overington
>> 28 May 2015
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150528/4af3c5c7/attachment.html>

More information about the Unicode mailing list