Proposing mostly invisible characters
Christoph Päper via Unicode
unicode at unicode.org
Thu Sep 12 07:53:45 CDT 2019
There are some characters that have no precedent in existing encodings and are also hard
to attest directly from printed sources. Can one still make a solid case for encoding those in Unicode?
I am thinking of characters that are either invisible (most of the time) or can become invisible under certain circumstances.
- HYPHEN U+2010 is *always* rendered as a hyphen (i.e. a centered horizontal bar glyph),
which may look identical to Hyphen-Minus U+002D.
- SOFT HYPHEN (SHY) U+00AD is *only* rendered as a hyphen *when* it appears at the end of a line.
- At least four existing math operators are *never* rendered with a visible glyph
and only explicitly encode semantics where syntax is potentially ambiguous otherwise:
* FUNCTION APPLICATION U+2061
is used where no multiplication is implied,
e.g. between an alphabetic function variable and an opening parenthesis: f(x).
* INVISIBLE TIMES U+2062
is used where multiplication by either TIMES U+00D7 or MIDDLE DOT U+00B7 is implied,
e.g. between a number and an alphabetic variable, constant or parenthesis: 2πr(a+b)
* INVISIBLE SEPARATOR U+2063
is used where enumeration by a COMMA U+002C or SEMICOLON U+003B (and possibly whitespace) is implied,
e.g. between two single-letter variable indices: aᵢⱼ.
* INVISIBLE PLUS U+2064
is used where addition by PLUS SIGN U+002B is implied,
e.g. between an integer and a vulgar fraction: 1⅔.
- INVERSE SOFT HYPHEN (ISHY) or SOFT INVISIBLE HYPHEN (SIHY)
is *always* rendered as a hyphen *unless* it appears at the end of a line.
- INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH)
is *never* rendered as a hyphen,
*but* the word it appears in is treated as if it contained one at its position.
- INVERSE SOFT COMMA (ISC) or SOFT INVISIBLE COMMA (SIC)
is *always* rendered as a comma *unless* it appears at the end of a line.
- INVISIBLE OPEN PARENTHESIS (IOP) and INVISIBLE CLOSE PARENTHESIS (ICP)
*should not* be rendered with a visible glyph, but *may* be for inline fallback.
ISHY/SIHY is especially useful for encoding (German) noun compounds in wrapped titles, e.g. on product labeling, where hyphens are often suppressed for stylistic reasons, e.g. orthographically correct _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010) may be rendered as _Spargel␤Suppe_ and could then be encoded as _Spargel<ISHY>Suppe_.
Like the existing invisible math operators, IHY/ZWH is used where the presence of its visible counterpart (i.e. HYPHEN) would be required syntactically (i.e. orthographically), but can be derived from context and convention (at least by human readers). This is useful for spell-checking, line-breaking etc., e.g. for words (commercial names in particular) with internal capital letters that would otherwise break orthographic rules and that should be broken at the of end a line without a hyphen added (i.e. like ISHY/SIHY, not SHY). This is very similar to ZERO-WIDTH SPACE (ZWSP) and WORD JOINER (WJ) indeed, except that ZWSP separates two words, where IHY/ZWH joins them into one, but unlike WJ still allows a line break.
ISC/SIC is particularly useful in wrapping table headers where a possible line break can take on the separating role of a comma.
IOP and ICP enclose mathematical expressions to override precedence of operators that would otherwise apply and they enclose textual annotation that should be displayed outside the normal row of characters, e.g. a sum in the numerator or denominator of a fraction and ruby/furigana pronunciation hints, respectively, that both *may* be rendered inline where advanced typographic functionality is unavailable and should then be parenthesized for clarity.
More information about the Unicode