Variation Sequences (and L2-11/059)

Janusz S. Bień via Unicode unicode at unicode.org
Fri Jul 20 00:04:48 CDT 2018


On Thu, Jul 19 2018 at 17:47 +0100, wjgo_10009 at btinternet.com writes:
> Janusz S. Bien wrote:
>
>> You seem to assume that my concern is only rendering.
>
> Well my thinking is that what you are wanting is a way to accurately
> transcribe documents and maybe printed books from Old Polish into a
> Unicode-based electronic format so that the information can be more
> readily studied, while retaining glyph information that is not
> presently representable using Unicode characters.

That's right.

As long as we have no corpus tools able to handle variation sequences,
both variation sequences and yuor proposal can be considered just a form
of transcription and your proposal may have perhaps a liitle advantage.

However if somebody will have time and/or money to implement a new
corpus software, it make more sense in my opinion to implement standard
variation sequences.

Of course sticking to the standard make sense if the standard is
reasonable. In my opinion Unicode was designed with only one application
in mind: some text is input on the keyboard and has to be rendered after
some processing. However due to the mass digitalization we have quite
often the reverse situation: we have scans with graphical object which
might be difficult to identify, we have to analyse the text somehow and
identyfying the Unicode characters is the final part of the research. To
be more specific, I will quote my response to David Perry on the MUFI
list:

On Fri, Jul 20 2018 at  6:54 +0200, jsbien at mimuw.edu.pl writes:
> On Wed, Jul 18 2018 at 13:33 -0700, [...] writes:

[...]

>> If you are working to digitize the Polish dictionary you mentioned,
>> the first step would be to determine whether there is any difference
>> in meaning between the two versions of the section sign. If not, just
>> encode them all with U+00A7.
>
> I beg to disagree.
>
> The difference should be encoded in some way (at the moment I plan to
> use a simple transciption like §⤾ for SECTION SIGN mirrored), than their
> occurrences analysed with some corpus tools (concordances etc.) and
> finally the opinion formulated about the function of the distinction or
> tha lack of it.

On the other hand, I was just surprised by the information from David
Perry, who said on the MUFI list:

> Note, however, that most applications check whether VSs have been
> registered for the script in use and, if not, they will not display
> the variants even if they a font maker has put them in. (I tried
> . . . )

If the consortium will be reluctant to register new sequences and the
software will strictly adhere to the standard, then there will be a
problem.

> I found the following.
>
> https://en.wikipedia.org/wiki/Old_Polish_language

Thank you for your interest in Polish language. I will answer to the
rest of you post a little later.

Best regards

Janusz

-- 
             ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien



More information about the Unicode mailing list