Marking up hexadecimal numbers (was: Re: a character for an unknown character)

Marcel Schneider charupdate at
Mon Jan 2 14:57:46 CST 2017

Iʼve messed up my e-mail by not converting HTML to text. Please disregard.
The used webmail applies HTML tags and deletes all unknown ones. Sorry.
On Sat, 31 Dec 2016 22:04:02 +0100 (CET), I wrote:
> On Sat, 31 Dec 2016 11:01:16 +0100, Christoph Päper wrote:
> > 
> > Richard Wordingham :
> > > 
> > >> Perhaps the letters for hexadecimal digits should have been encoded
> > >> separately?
> > > 
> > > The idea has been rejected several times.
> > 
> > It has indeed. That’s why two different technologies have to be used to get 
> > typographically harmonic hexadecimal numbers, e.g. in CSS:
> > 
> > .hex {font-variant-numeric: oldstyle-nums; text-transform: lowercase;}
> > .hex {font-variant-numeric: lining-nums; text-transform: uppercase;}
> > 
> > This works well enough for ‘01ef’ or ‘01EF’, but will fail for conventions like 
> > ‘0x01ef’ and ‘01EFh’. Hence:
> > 
> > .hex::before {content: "0x"; text-transform: none;}
> > .hex::after {content: "h"; text-transform: none;}
> > .hex::after {content: "ₕ";}
> > .hex::after {content: "16"; vertical-align: sub; font-size: smaller; line-height: normal;}
> > .hex::after {content: "16"; font-variant-position: sub;}
> > .hex::after {content: "₁₆";}
> Thank you for the code. I didnʼt know this, so Iʼve tried and found that 
> the automatic prefixes/suffixes cannot be copied from the web page. 
> That seems to me a disadvantage.
> Among the possibilities, you include Unicode subscripts. Is this current 
> practice? That seems to me very interesting to follow up, as it documents 
> that the stable representation scheme is already adopted. Iʼm curious to 
> what extent it is so. 
> I note that the "U+" prefix is missing in the list, obviously because it 
> denotes more than just a hexadecimal number, and is to be hard-coded. 

Alternatively, the CSS style derived from the above could be:

.unicode {font-variant-numeric: lining-nums; text-transform: uppercase;}
.unicode::before {content: "U+"; text-transform: none;}

But again, when the reader copies such a scalar value, he gets it without 'U+'.
Hence the idea that the '<span class="unicode">[[H]H]HHHH</span>' could be 
parsed to add the prefix after the open-tag, so as to be able to skip the 
second line above. 

Similarly, the '<span class="hex">HHHH</span>' can be complemented with '₁₆', 
or with '0x' or '\x' or whatever, as hard-coded additions by a parser. 
This has IMO two advantages:

1) When the user copies hex numbers from the browser, hex numbers stay prefixed 
or suffixed as such.

2) When the user pastes hex numbers into a text editor, theyʼre not messed up 
(applies to the '₁₆' suffix, vs '_{16}' suffix). Otherwise, a hex number like 
'1A19₁₆' is turned to '1A1916'.

The actual policy is certainly based on the classification of hexadecimal numbers 
(and numbers in other non-decimal numeral systems) as mathematical notation, 
rather than technical notation. In a wide lecture of TUS, all measurement units 
are granted the use of superscript digits '²' and '³'. Could this policy be 
extended to include subscript '₁' and '₆'? This may seem an odd question, and 
responding it positively would eventually throw the door open to wider use of 
Latin superscripts in historical data first ('Vᵉ s.'), in more general data next.

As the upside I see content stability and streamlined input (provided that the 
input interface is up-to-date). Disparity in display may be considered a downside, 
since only fonts that have reduced capitals (Consolas, Lucida Console, Courier) 
have modifier letters accurately like superscripts / ordinal indicators. Iʼve 
started getting habits with using modifier letters in abbreviations, and I find 
they look good in other fonts too.

Right now, itʼs just up to put them on the keyboard and tell the user “please use
them if you are comfortable with; original encoding for phonetics does not 
preclude re-use and diversification of usage conventions.” There is a need of 
some explanation to be delivered, because people who know something about Unicode 
typically oppose the sometimes passionate refrain saying that these characters are 
for use in phonetics only.

Definitely, by the actual wording of the relevant parts of the Unicode Standard, 
Unicode is fueling its own misperception.

Some hints in the opposite way, ideally in TUS 10.0 to be published this year 2017, 
would (in my opinion) be highly appreciated. Though of course that is not enough to 
make people really happy.


More information about the Unicode mailing list