Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode
J Decker via Unicode
unicode at unicode.org
Mon Apr 2 14:04:15 CDT 2018
I was really hoping this was a joke... it didn't hit me it was April 1...
PlaneAllocated code points[note 1]
Totals 280,016 136,755
almost 50% used now.
Though that table omits 655,350 code points as 'unassigned' so it's really
only about 16% (1/6) used
using only 4-byte utf8 or 2 byte utf-16...
and of those, that's only 20(plus or minus a faction of 1) bits?
so a proposal of something a power of 6 larger than that when even just 1
more bit gives another million characters....
I guess if it was encoded every word as a single code point... that
wouldn't be enough seems about 7,716,121 words... so.. 24 bits. plus 1 to
double it for good measure?
On Mon, Apr 2, 2018 at 11:15 AM, William_J_G Overington via Unicode <
unicode at unicode.org> wrote:
> Doug Ewell wrote:
> > Martin J. Dürst wrote:
> >> Please enjoy. Sorry for being late with forwarding, at least in some
> >> parts of the world.
> > Unfortunately, we know some folks will look past the humor and use this
> as a springboard for the recurring theme "Yes, what *will* we do when
> Unicode runs out of code points?"
> An interesting thing about the document is that it suggests a Unicode code
> point for an individual item of a particular type, what the document terms
> an imoji.
> This being beyond what Unicode encodes at present.
> I wondered if this could link in some ways to the Internet of Things.
> I had never heard of IPv6. Indeed I checked on the Internet to find
> whether that was real. So I have started reading and learning.
> It would, in fact, be quite straightforward to encode what the document
> terms 128-bit Unicode characters.
> For example, U+FFF8 could be used as a base character and then followed by
> a sequence of 32 tag characters, each of those 32 tag characters being from
> the range
> U+E0030 TAG DIGIT ZERO .. U+E0039 TAG DIGIT NINE, U+E0041 TAG LATIN
> CAPITAL LETTER A .. U+E0046 TAG LATIN CAPITAL LETTER F
> That is, a newly-defined character from the Specials and then 32 tag
> characters encoding a hexadecimal code point.
> Now, if that were called 128-bit Unicode then there could be problems of
> policy, but if it were given another name so that it sits upon a Unicode
> structure so as to provide an application platform that can be manipulated
> using Unicode tools, including existing Unicode interchange formats, and
> display formats for character glyphs, then maybe something useful can be
> Thus using 128-bit binary numbers in a local computer system and using
> existing Unicode characters for interchange of information between computer
> systems, converting from the one format to the other depending upon the
> needs for local processing and for interchange of information.
> Of particular significance is the concept of encoding individual items
> each with its own code point.
> Could this be used to relate glyphs to the Internet of Things?
> Could things like International Standard Book Numbers be included, with a
> code point for each book edition?
> What about individual copies of a rare book?
> What about museum items?
> What about paintings and sculptures?
> Could this tie up with serial numbers used in GS1-128 Barcodes?
> Please note that the 128 in GS1-128 refers to the 128 characters of ASCII,
> not to 128-bits.
> I am wondering whether U+FFF8 plus 32 tag characters could be handled
> directly by a GSUB glyph substitution within an OpenType font.
> However, with such a large code space, there would need to be a way to
> access glyph information over the internet, maybe use of a one-glyph web
> font for each glyph would be possible in some way.
> William Overington
> Monday 2 April 2018
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode