Odp: Re: Is emoji +VC15, +VC16, without VC one or two columns with monospace font?🏝️

piotrunio-2004@wp.pl piotrunio-2004 at wp.pl
Sat Apr 26 02:49:37 CDT 2025


Dnia 26 kwietnia 2025 09:37 Eli Zaretskii <eliz at gnu.org> napisał(a):  Date: Sat, 26 Apr 2025 09:15:02 +0200  Cc: unicode at corp.unicode.org <unicode at corp.unicode.org>,         b at bapha.be <b at bapha.be>  From: "piotrunio-2004 at wp.pl via Unicode" <unicode at corp.unicode.org>    Dnia 26 kwietnia 2025 09:03 Eli Zaretskii <eliz at gnu.org> napisał(a):   Date: Fri, 25 Apr 2025 23:21:05 +0200  From: "piotrunio-2004 at wp.pl via Unicode" <unicode at corp.unicode.org>   In non-Unix-like terminals, the width is always linearly proportional to the amount of bytes that the  text takes  in memory, because that is how a random access array works. Each character cell takes a  constant amount  of bytes, for example in VGA-compatible text mode there are 16 bits per character cell (8 bits for  attributes  and 8 bits for character code), and in Win32 console there are 32 bits per character cell (16 bits  for  attributes and 16 bits for character code). Whether a character is fullwidth may be determined by  the text  encoding (some legacy encodings such as Shift JIS will store fullwidth characters in the bytes of  two  consecutive character cells) or by attributes.   I think you have very outdated mental model of how the Windows console  works and how it represents and encodes characters.  In particular,  the width of a character is NOT determined by the length of its byte  sequence, but by the font glyphs used to display those characters.   The CHAR_INFO structure is defined as a 32-bit structure with 16 bits for attributes and 16 bits for character  code. The Win32 API allows for directly reading and writing arrays of that structure by using  ReadConsoleOutput and WriteConsoleOutput functions. This means that there is absolutely no way that a  native Win32 console could possibly store its characters in a variable amount of bytes; in particular, the  structure cannot store emoji +VC15, +VC16 sequences because it was never intended for that purpose.   You seem to assume that the layout of characters in memory is the same  as their layout on display.  This was true for MS-DOS terminals, but  is no longer true on modern Windows versions, where WriteConsoleOutput  and similar APIs do not write directly to the video memory.  Instead,  they write to some intermediate memory structure, which is thereafter  used to draw the corresponding font glyphs on display.  I'm quite sure  that the actual drawing on the glass is performed using shaping  engines such as DirectWrite, which consult the font glyph metrics to  determine the width of glyphs on display.  The actual width of  characters as shown on display is therefore not directly determined by  the amount of bytes the characters take in their UTF-16  representation.   The legacy Win32 console does not use DirectWrite, so what you're describing seems to be Windows Terminal, which as I said is a unix-like terminal, it's not a native Win32 console at all. It can emulate a Win32 console, but it doesn't run it natively. This only further shows that emoji +VC15, +VC16 in terminals only exists in unix-like context.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250426/fdb96c31/attachment.htm>


More information about the Unicode mailing list