Private Use areas

Janusz S. Bień via Unicode unicode at unicode.org
Wed Aug 29 00:47:47 CDT 2018


On Tue, Aug 28 2018 at  9:43 -0700, unicode at unicode.org writes:
> On August 23, 2011, Asmus Freytag wrote:
>
>> On 8/23/2011 7:22 AM, Doug Ewell wrote:
>>> Of all applications, a word processor or DTP application would want
>>> to know more about the properties of characters than just whether
>>> they are RTL. Line breaking, word breaking, and case mapping come to
>>> mind.
>>>
>>> I would think the format used by standard UCD files, or the XML
>>> equivalent, would be preferable to making one up:

Right. I was not so quick to state this so early, but 2 years ago I
wrote to the MUFI list:


--8<---------------cut here---------------start------------->8---
On Sat, Jan 02 2016 at 12:35 CET, odd.haugen at uib.no writes:

[...]

> Note the permanent URI at the University Library in Bergen. This will
> in all likelihood be the last recommendation of its kind (and
> certainly the last edited by the undersigned), so please look out for
> new solutions (databases or the like) on the MUFI web site!

I think that one of the forms, perhaps even the primary one, should
follow the original Unicode Character Database and the
output of Unibook (http://www.unicode.org/unibook/).

The idea can be tested by converting the present recommendation to this
form. Unfortunately I'm unable to contribute myself to this task.

One of the advantages would be that the various character browsers can
be adapted relatively easily to provide info about the MUFI characters.

A simpler variant of this idea is to use Unibook-like format to
document fonts. A quick-and-dirty tools for this purpose has been
prepared by a student of mine:

https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/
https://bitbucket.org/jsbien/unicode-ucd-parser

A sample output of the tools is available at

https://bitbucket.org/jsbien/parkosz-font/downloads/Parkosz1907draft.pdf

(the font is also quick-and-dirty and unfinished work).

--8<---------------cut here---------------end--------------->8---

Unfortunately there was no reaction.

>>
>> The right answer would follow the XML format of the UCD.
>>
>> That's the only format that allows all necessary information contained
>> in one file,

For me necessary are also comments and crossreferences contained in
NamesList.txt. Do I understand correctly that only "ISO Comment
properties" are included in the file?

>> and it would leverage of any effort that users of the
>> main UCD have made in parsing the XML format.
>>
>> An XML format shold also be flexible in that you can add/remove not
>> just characters, but properties as needed.
>>
>> The worst thing do do, other than designing something from scratch,
>> would be to replicate the UnicodeData.txt layout with its random, but
>> fixed collection of properties and insanely many semi-colons. None of
>> the existing UCD txt files carries all the needed data in a single
>> file.
>
> I don't know if or how I responded 7 years ago, but at least today, I
> think this is an excellent suggestion.
>
> If the goal is to encourage vendors to support PUA assignments, using an
> exceedingly well-defined format (UAX #42) sitting atop one of the most
> widely used base formats ever (XML), with all property information in a
> single repository (per PUA scheme), would be great encouragement.

I think we need also the data in the format acceptable by UniBook.

> I've devised lots of novel file formats and I think this is one use
> case where that would be a real hindrance.

> Storing this information in a font, by hook or crook, would lock users
> of those PUA characters into that font. At that rate, you might as well
> use ASCII-hacked fonts, as we did 25 years ago.

Storing the information in a font is inappropriate not only for the
technical reasons, as I wrote recently (on Thu, Aug 23 2018)

> Fonts are for *rendering*, new characters and variants are more and
> more often needed for *input* of real life old texts with sufficient
> precision.

Best regards

Janusz

-- 
             ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien


More information about the Unicode mailing list