Request for Information

Asmus Freytag asmusf at
Fri Jul 25 14:47:14 CDT 2014

On 7/25/2014 8:49 AM, fantasai wrote:
> On 07/24/2014 06:45 PM, Whistler, Ken wrote:
>> Fantasai asked:
>>> I would like to request that Unicode include,*for each writing 
>>> system it **
>>> **encodes*, some information on how it might justify.
>> Following up on the comment and examples provided by Richard
>> Wordingham, I'd like to emphasize a relevant point:
>> Scripts may be used for *multiple* (different) writing systems.
> Hence the use of "for each writing system" rather than "for each
> script" in the sentence you quote above.

But the sentence implies that Unicode encodes "writing systems".
That is not the case. Unicode encodes characters, which are elements
of scripts, and usually not specific to a given writing system.

The various "default" algorithms that Unicode publishes are an
attempt to deal with "plain text", where it access to detailed
information about the writing system may not be available.

To be useful, it helps if these algorithms can be modified (tailored)
for situations where additional knowledge is available, but even
with a few examples of that process, this is a far cry from providing
information "for each writing system".

>> I think it would make more sense to turn fantasai's query on its
>> head, as it were: First categorize what kinds of systems of
>> justification there are, and then start filling in, from best
>> understood out to the fringes of knowledge of practice, what
>> writing systems (using what script or combination of scripts)
>> are attested as regularly using each system. Lacunae are
>> inevitable, however.
> Justification systems typically expand or compress spaces,
> and when that fails (becoming too small or too large, where
> the tolerances vary widely per writing system), fall back to
> "letter-spacing". The interaction of different levels of
> justification (e.g. spaces vs. letter-spacing) depends on
> the justification algorithms, and the tolerances for spacing
> adjustments depends on the writing system and the quality of
> the typesetter.

You left out the use of optional ligatures.
> It is my observation that systems with fewer spaces are more
> tolerant of letter-spacing.

My example (Fraktur) shows that other factors can come to play
and that some writing systems can depart quite decisively from
writing systems for the same script, and, in this case, even the
same language.

>      (For example, it would be nice if the standard mentioned
>      whether punctuation like the Javanese pada lingsa are expected
>      to be followed by a space character, so that font makers,
>      layout engineers, and typists can coordinate accordingly to
>      create the appropriate amount of white space on the screen.)

This information is rather more proper for Unicode to collect since
it addresses conventions of how to encode texts, which is central
to the mission of the Unicode Standard as character encoding.
>   2. Which characters are "separable" for justification.
>      Some languages (like German) may suppress such separation.
>      And the rules for determining separable "clusters" can be
>      language and/or font-dependent.
>      However it can be said with certainty that Latin letters,
>      for example, are separable, whereas Arabic letters are not.

The minute you make a blanket statement like that, you promulgate
a lowest common denominator behavior for software, where any
language requiring tailoring will see a degradation, because tailorings
will be unlikely.

The correct way to characterize Latin would be that while separation is
allowable in some languages, it is not preferred, and justification 
would normally apply a penalty for any line that requires it.

The penalty values for some languages are much higher, and may also
depend on the application. It's easy to find examples of newspaper
layout in the US that are quite tolerant (if not to say over-tolerant) of
letter spacing, but those represent an extreme.

The point I am trying to make is that justification goes a lot further
into the direction of "typography" than what is covered by UAX#14
(which covers line breaking opportunities - but not how to select
among them for best typographical result).

By providing some generalized statements, you may do more harm
than good, because you would by necessity enshrine the lowest
common denominator. For "plain text" in Latin, you could argue
the case in reverse and state that using letter spacing is not "safe",
because you don't know whether the text is in a language or for
a context where letter-spacing should be given a high or very high
penalty value.

Something similar goes of applying ligatures. Applying them for
plain text in Latin is inappropriate, because each language has
or may have rules when ligatures are appropriate or where they
are required or forbidden - and these rules change over time
(like the changes made in the 1990's orthography change for
German, or the 1940 switch away from Fraktur).

Still, Latin does have ligatures and there are writing systems where
justification uses fewer or more ligatures to affect the rate at which
text is condensed.

But, as can be seen from these examples, a global statement
on the script level is not helpful, unless a practice is really
universal and can safely be applied to plain

>      This information is mostly represented in UAX29, with the
>      exception that there's no really clear information on
>      which scripts are "cursive" (have inseparable grapheme
>      clusters).
> There are exceptional systems:
>   - Arabic can use cursive elongation for justification.
>   - Japanese and Chinese can compress the inherent "spaces"
>     within the full-width glyphs of certain punctuation.
>   - Tibetan can use tsek marks as filler for justification.
>     (Which is, by the way, discussed *extensively* in the Unicode
>     standard, so you can't tell me that the Unicode Consortium
>     considers notes on common justification practices to be out
>     of scope.)

What you are asking for, is in effect a "survey of typographical
practices". This would be a fine project if published in some
form that is independent of the Unicode Standard, for example
as a technical note.

That survey would need to focus on *writing systems*, and not
on script properties. Because the latter are really inappropriate
for making the correct choice.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list