Unicode encoding philosophy

Kent Karlsson kent.b.karlsson at bahnhof.se
Thu Oct 5 12:31:44 CDT 2023



> 4 okt. 2023 kl. 22:58 skrev Rebecca Bettencourt via Unicode <unicode at corp.unicode.org>:
> 
> The alignment of quotation marks in a CJK square is an issue affecting very few characters, with no easy mechanism in markup or rich text formatting, with precedent in the form of SVSes for other punctuation marks used in CJK text.
> 
> Italics applies to a large, open-ended set of characters (possibly the entire Unicode character set), has been implemented in just about every form of markup and formatting ever conceived, and has no precedent of implementation using VSes (other than the use of VS15/VS16 for text vs emoji presentation, which even the UTC has determined was a mistake).

I have missed that (busy with other things)… But I do not agree that text/emoji variation sequences were a mistake. Indeed it should be extended and systematized.

However, the proposal in L2023/23212 would be a major mistake to accept. Most of the ’forms’ given are completely different characters, especially those in VS2—Vertical—Hans are completely different from the supposed ”base” characters. (That they are used for more or less the same purpose is not a reason to coalesce them.) Variation sequences with FE00/FE01 are for (in some sense) minor typographical differences that are essentially indifferent (except if you care about typography). (The rotated hieroglyphs went too far…)

/Kent K

> 
> -- Rebecca Bettencourt
> 
> 
> On Wed, Oct 4, 2023 at 10:58 AM William_J_G Overington via Unicode <unicode at corp.unicode.org <mailto:unicode at corp.unicode.org>> wrote:
> I have been reading the following.
> 
> https://www.unicode.org/L2/L2023/23212-quotes-svs-proposal.pdf <https://www.unicode.org/L2/L2023/23212-quotes-svs-proposal.pdf>
> 
> I am not an expert on this at all. It looks good and I hope it becomes 
> implemented.
> 
> What puzzles me though, is that structurally the proposal seems to have 
> much the same encoding philosophy as a suggestion proposed by me in that 
> they both would allow a variation selector to be used so as to conserve 
> in plain text information that is typically these days conserved in rich 
> text and gets lost if plain text is used. In my proposal, using a 
> variation selector to conserve in a plain text document information 
> about the use of italics in some text.
> 
> My proposal was rejected, quite strongly.
> 
> So, deep down, what please is the Unicode encoding philosophy that 
> allows variation selectors to be used to conserve some information, yet 
> not other information, in plain text?
> 
> William Overington
> 
> Wednesday 4 October 2023
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20231005/9afa782d/attachment-0001.htm>


More information about the Unicode mailing list