Variations and Unifications ?

Philippe Verdy verdy_p at wanadoo.fr
Thu Mar 17 01:11:33 CDT 2016


"Disunification may be an answer?" We should avoid it as well.

We have other solutions in Unicode
- variation selectors (often used for sinograms when their unified shapes
must be distinguished in some contexts such as people names or toponyms or
trademark names or in other specific contexts),
- or combining sequences (including in Arabic or Hebrew where many
combining characters are not always represented visually, the same occuring
as well in Latin with accents not always presented over capitals),
- or sequences of multiple characters (like in Emojis for skin color
variants, or sequences for encoding flags),
- or other sequences using joiners (e.g. in South Asian scripts).

Disunification is only acceptable when
- there's a complete disunification of concepts and the "similar" shapes
are also different even if one originates from the other (E.g. the Latin
slashed o disunifiied from the Latin o, even if there's also the sequence
o+combining slash, almost never used as its rendering is too much
approximative in most cases)
- or there's a clear distinction of semantics and properties (e.g. the
Latin AE ligature, which is not appropriately represented by the two
separate letters, not even with a "hinting" joiner, and that has specific
properties as a plain letter, e.g. with mappings)

Before disunifying a character, we should first study the alternative of
their representation as sequences.

2016-03-16 18:34 GMT+01:00 Asmus Freytag (t) <asmus-inc at ix.netcom.com>:

> On 3/15/2016 8:14 PM, David Faulks wrote:
>
> As part of my investigations into astrological symbols, I'm beginning to wonder if glyph variations are justifications for separate encoding of symbols I would have previously considered the same or unifiable with symbols already in Unicode.
>
> For example, the semisquare aspect is usually shown with a glyph that is identical to ∠ (U+2220 ANGLE). However, sometimes it looks like <, or like ∟ (U+221F RIGHT ANGLE). Would this be better encoded as a separate codepoint?
>
> The parallel aspect, similarily, sometimes looks like ∥ (U+2225 PARALLEL TO), but is often shown as // or ⫽ (U+2AFD DOUBLE SOLIDUS OPERATOR). This is not a typographical kludge since astrological fonts often show it this way.
> There is also contra-parallel, which sometime is shown like ∦ (U+2226 NOT PARALLEL TO), but has varaint glyphs with slated lines (and the crossbar is often horizontal).
>
> The ‘part of fortune’ is sometimes a circled ×, or sometimes a circled +.
>
> Would it be better to have dedicated characters than to assume unifications in these cases?
>
>
>
> My take is that for symbols there's always that tension between encoding
> the "concept" or encoding the shape. In my view, it is often impossible to
> answer the question whether the different angles (for example) are merely
> different "shapes" of one and the same "symbol", or whether it isn't the
> case that there are different "conventions" (using different symbols for
> the same concept).
>
> Disunification is useful, whenever different concepts require distinct
> symbol shapes (even if there are some general similarities). If other
> concepts make use of the same shapes interchangeably, it is then up to the
> author to fix the convention by selecting one or the other shape.
> Conceptually, that is similar to the decimal point: it can be either a
> period, or a comma, depending on locale (read: depending on the convention
> the author follows).
>
> Sometimes, concepts use multiple symbol shapes, but all of these shapes
> map to the same concept (and other uses are not known). In that case,
> unifying the shapes might be acceptable. The selection of shape is then a
> matter of the font (and may not always be under the control of the author).
> Conceptually, that is similar to the integral sign, which can be slanted or
> upright. The choice is one of style. While authors or readers may prefer
> one look over the other, the identity of the symbol is not in question, and
> there's no impact on transmission of the contents of the text.
>
> Whenever we have the former case, that is, multiple conventional
> presentations that are symbols in their own right in other contexts, then
> encoding an additional "generic" shape should be avoided. Unicode
> explicitly did not encode a generic "decimal point". If the convention that
> is used matters, the author is better off being able to select a specific
> shape. The results will be more predictable. The downside is that a search
> will have to cover all the conventions. Conceptually, that is no different
> from having to search for both "color" and "colour".
>
> The final case is where a convention for depicting a concept uses a symbol
> that itself has some variability (for example when representing some other
> concepts), such that some of its forms make it less than ideal for the
> conventional use intended for the concept in question. Unicode has
> historically not always been able to provide a solution. In some of these
> cases, plain text (that is, without a fixed font association) may simply
> not give the desired answer. If specialized fonts for the convention (e.g.
> astrological fonts) do not usually exist or can't be expected, then
> disunifying the symbol's shapes may be an answer.
>
> A./
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160317/feef3e8a/attachment.html>


More information about the Unicode mailing list