What's the process for proposing a symbol in the Unicode table?

Asmus Freytag asmusf at ix.netcom.com
Sun Feb 18 02:18:20 CST 2024


On 2/17/2024 12:02 PM, Christoph Päper via Unicode wrote:
> Asmus Freytag via Unicode<unicode at corp.unicode.org>:
>> We usually don't encode characters intended for use in handwriting, except if they are needed to digitally archive manuscripts. Not sure grade school papers pass that bar.
> Every piece of writing might be digitally archived nowadays, even more so in the future. Therefore, every _established_ literal atomic sign should be encodable, so it can be unambiguously read by machines. I strongly believe this includes paralinguistic signs, whereas nonlinguistic signs (e.g. much of ISO 7000) would require an extension of the scope of Unicode (although several graphic symbols from that and other standards already have a codepoint assigned to them).
>
> This one is clearly well established, i.e. has at least one canonical form and meaning, even if its use is geographically limited. It cannot be represented by a combination of other, already encoded characters.
>
That's an argument a proposal could make, but I'm not sure I'm ready to 
agree with that analysis.

Even if we approach 100% digital archiving, not everything can be, will 
be or needs to be archived as *plain text*. (Or even rich text).

Manuscripts are a good example of handwritten text that benefits from 
conversion to digital text, because they are subject of intense 
scholarship that would benefit from having the usual array of digital 
text processing available, such as search, and convenient rendering of 
excerpts.

People are studying the marks accompanying cave paintings, such as 
lines, circles or dots. One even resembles a hash mark #, making that 
arguably the oldest uniquely recognizable symbol ever encoded as a 
character. (Aside: dots and lines don't count, because we encode many 
different dots and lines).

For those studies, there's no overriding need to place the symbols into 
running text, or to attempt to show sequences of them as plain text. 
Therefore, such use alone is not sufficient rationale for deciding the 
question what constitutes an abstract character and to provide a 
standardized encoding, plus assign properties such as line breaking 
behavior.

The Dutch mark in question is interesting in that it's clearly 
associated with a well-defined concept and has a recognizable (and 
conventional) shape. Neither of those two aspects present any obstacle 
to encoding. However, the need to represent it in plain text needs to be 
established and any successful proposal will have to provide an argument 
that is specific and to the point.

The mere claim of a general principle as suggested above is not 
sufficient to make a persuasive argument for a specific encoding.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240218/9b349f88/attachment.htm>


More information about the Unicode mailing list