Private Use areas

Tue Aug 28 11:43:01 CDT 2018

On August 23, 2011, Asmus Freytag wrote:

> On 8/23/2011 7:22 AM, Doug Ewell wrote:
>> Of all applications, a word processor or DTP application would want
>> to know more about the properties of characters than just whether
>> they are RTL. Line breaking, word breaking, and case mapping come to
>> mind.
>>
>> I would think the format used by standard UCD files, or the XML
>> equivalent, would be preferable to making one up:
>
> The right answer would follow the XML format of the UCD.
>
> That's the only format that allows all necessary information contained
> in one file, and it would leverage of any effort that users of the
> main UCD have made in parsing the XML format.
>
> An XML format shold also be flexible in that you can add/remove not
> just characters, but properties as needed.
>
> The worst thing do do, other than designing something from scratch,
> would be to replicate the UnicodeData.txt layout with its random, but
> fixed collection of properties and insanely many semi-colons. None of
> the existing UCD txt files carries all the needed data in a single
> file.

I don't know if or how I responded 7 years ago, but at least today, I
think this is an excellent suggestion.

If the goal is to encourage vendors to support PUA assignments, using an
exceedingly well-defined format (UAX #42) sitting atop one of the most
widely used base formats ever (XML), with all property information in a
single repository (per PUA scheme), would be great encouragement. I've
devised lots of novel file formats and I think this is one use case
where that would be a real hindrance.

Storing this information in a font, by hook or crook, would lock users
of those PUA characters into that font. At that rate, you might as well
use ASCII-hacked fonts, as we did 25 years ago.

--
Doug Ewell | Thornton, CO, US | ewellic.org