Variation Sequences (and L2-11/059)

Mon Jul 16 13:00:29 CDT 2018

Hi

> I ask the question because there are now several historical corpora of Polish under development, which use at present a kind of fall-back or some other ad hoc solutions for "nonce glyphs", as they are called in the FAQ.

I wonder if you could say please what are the "kind of fall-back or some other ad hoc solutions" please.

The reason I ask is because I have thought of a possible solution to the problem that has graceful fall-back and uses only plane 0 characters, no Private Use Area characters at all: I am wondering whether my suggestion will be of use or if it is just another method that could just be added to a collection of "kind of fall-back or some other ad hoc solutions".

My suggestion is to use for each desired glyph a sequence consisting of three characters, and then have an OpenType font decode them so that the glyph can be displayed.

Each such sequence being of the form.

Base character ZERO WIDTH JOINER then a circled digit character or a circled number character.

http://www.unicode.org/charts/PDF/U2460.pdf

Thus there being up to twenty specific glyphs for each base character.

The list of glyphs could be gradually extended as needed and if an attempt to display a newly added glyph is made using a font implemented from an earlier list then there would be graceful fall-back to the base character followed by a circled digit.

It would be helpful for entering text into documents if the ZERO WIDTH JOINER character has a visible glyph within the font. Then entering text with OpenType glyph substitution turned off could be easier to carry out.

I am wondering quite how acceptable such a solution would be for standardization: the list of ways that something can be encoded using a ZWJ (ZERO WIDTH JOINER) character seems to have recently been de facto extended for use with generating emoji sequences - not with circled digits but use of ZWJ to change meaning which is a far bigger extension than needed for this suggestion as meaning would often be unaltered when using this suggestion.

William Overington

Monday 16 July 2018

----Original message----
>From : unicode at unicode.org
Date : 2018/07/16 - 06:07 (GMTDT)
To : unicode at unicode.org
Subject : Variation Sequences (and L2-11/059)

FAQ (http://unicode.org/faq/vs.html) states:

    For historic scripts, the variation sequence provides a useful tool,
    because it can show mistaken or nonce glyphs and relate them to the
    base character. It can also be used to reflect the views of
    scholars, who may see the relation between the glyphs and base
    characters differently. Also, new variation sequences can be added
    for new variant appearances (and their relation to the base
    characters) as more evidence is discovered.

It states also:

   What variation sequences are valid?
   Only those listed in StandardizedVariants.txt...

However the file in question contains only sections for mathematics and
some rather exotic scripts.

To the best of my knowledge, the only attempt to introduce additional
variation sequences was the strongly criticised Karl Pentzlin's proposal
L2-11/059

http://www.unicode.org/L2/L2011/11059-latin-cyr-var.pdf

What has happen to it? I don't remember any information about it on the
list.

However my primary question is:

Are variation sequences *really* recommended for historical scripts?

I ask the question because there are now several historical corpora of
Polish under development, which use at present a kind of fall-back or
some other ad hoc solutions for "nonce glyphs", as they are called in
the FAQ.

Best regards

Janusz

-- 
             ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien