Is emoji +VC15, +VC16, without VC one or two columns with monospace font?🏝️

Eli Zaretskii eliz at gnu.org
Sat Apr 26 02:37:44 CDT 2025


> Date: Sat, 26 Apr 2025 09:15:02 +0200
> Cc: unicode at corp.unicode.org <unicode at corp.unicode.org>,
>         b at bapha.be <b at bapha.be>
> From: "piotrunio-2004 at wp.pl via Unicode" <unicode at corp.unicode.org>
> 
> 
> Dnia 26 kwietnia 2025 09:03 Eli Zaretskii <eliz at gnu.org> napisał(a):
> 
>  Date: Fri, 25 Apr 2025 23:21:05 +0200
>  From: "piotrunio-2004 at wp.pl via Unicode" <unicode at corp.unicode.org>
> 
>  In non-Unix-like terminals, the width is always linearly proportional to the amount of bytes that the
>  text takes
>  in memory, because that is how a random access array works. Each character cell takes a
>  constant amount
>  of bytes, for example in VGA-compatible text mode there are 16 bits per character cell (8 bits for
>  attributes
>  and 8 bits for character code), and in Win32 console there are 32 bits per character cell (16 bits
>  for
>  attributes and 16 bits for character code). Whether a character is fullwidth may be determined by
>  the text
>  encoding (some legacy encodings such as Shift JIS will store fullwidth characters in the bytes of
>  two
>  consecutive character cells) or by attributes.
> 
>  I think you have very outdated mental model of how the Windows console
>  works and how it represents and encodes characters.  In particular,
>  the width of a character is NOT determined by the length of its byte
>  sequence, but by the font glyphs used to display those characters.
> 
> The CHAR_INFO structure is defined as a 32-bit structure with 16 bits for attributes and 16 bits for character
> code. The Win32 API allows for directly reading and writing arrays of that structure by using
> ReadConsoleOutput and WriteConsoleOutput functions. This means that there is absolutely no way that a
> native Win32 console could possibly store its characters in a variable amount of bytes; in particular, the
> structure cannot store emoji +VC15, +VC16 sequences because it was never intended for that purpose.

You seem to assume that the layout of characters in memory is the same
as their layout on display.  This was true for MS-DOS terminals, but
is no longer true on modern Windows versions, where WriteConsoleOutput
and similar APIs do not write directly to the video memory.  Instead,
they write to some intermediate memory structure, which is thereafter
used to draw the corresponding font glyphs on display.  I'm quite sure
that the actual drawing on the glass is performed using shaping
engines such as DirectWrite, which consult the font glyph metrics to
determine the width of glyphs on display.  The actual width of
characters as shown on display is therefore not directly determined by
the amount of bytes the characters take in their UTF-16
representation.


More information about the Unicode mailing list