["Unicode"] Re: Some questions about Unicode's CJK Unified Ideograph

Sat May 30 00:46:21 CDT 2015

Hi,

Please let me ask a slightly off-topic question,
䛩 = ⿰言亞 (not ⿰言亜) is coded at U+46E9. Of course,
the unification between 亞 vs 亜 is not applied basically,
so the separated encoding of ⿰言亜 would be reasonable
(if there is a requirement), but I want to know whether
Vietnamese user community distinguishes ⿰言亞 and ⿰言亜
semantically. Do you know anything?

Regards,
mpsuzuki

Ken Whistler wrote:
> 
> 
> On 5/29/2015 5:20 PM, gfb hjjhjh wrote:
>>
>> 1. I have seen a chinese character ⿰言亜 from a Vietnamese dictionary 
>> NHAT DUNG THUONG DAM DICTIONARY* *
>>
> 
>>  So, a.) In http://www.unicode.org/alloc/Pipeline.html , it show that 
>> CJK Extension E and F have already been accepted, but where can I 
>> check those proposals to see if the xharacter is in them or not?
>>
> 
> For Extension E, you can check the following code chart:
> 
> http://www.unicode.org/charts/PDF/Unicode-8.0/U80-2B820.pdf
> 
> See: U+2C89A..U+2C931 (pp. 54-56 of the pdf) for the relevant
> radical (#149). But I don't see that character in the list of
> Extension E characters.
> 
> Extension F is harder to track down, because it has not yet been
> approved by the UTC, and comes in two pieces, with different
> progression so far in the ISO committee. Perhaps somebody on this list
> who has better access to the relevant documents can let you
> know whether ⿰言亜 can be found in those sets.
> 
>> and b.) it say to propose a new character, the proposal must include 
>> information about someone who would agree to provide a computer font 
>> for publishing the standard, do that mean i have to provide info about 
>> someone who is anticipated to agree on doing so or do i need to 
>> contact them for their agreement first, and does that mean I can just 
>> put info of someone who are making free full unicode CJK coverage font 
>> into the proposal?,
>>
> 
> It would require (eventually) provision of a font with correct display
> of just the character proposed -- but in the case of CJK additions, these
> first go through a process of collection and review by the Ideographic
> Rapporteur Group. The best thing to do is to work with a national
> body concerned with CJK characters and ensure that they include
> this character on their list of submissions for IRG review.
> 
>> and c.) just like the question (b), do "names and addresses of 
>> appropriate contacts within national body or user organizations" 
>> represent Vietnamese government in this case?
>>
> 
> If the character has not been submitted to the IRG for review, it would
> probably be best to work through the Vietnamese national standards
> body. Again, people on this list may be able to provide you the
> correct contact information for them.
> 
>> 2. Is combined characters like U+20DD intended to work with all 
>> different type of characters, or is it some problem related to 
>> implementation ? as I when i write ゆ⃝ (Japanese Hiragana Letter Yu + 
>> Combining Enclosing Circle) appear to be separate on most font I use, 
>> but if I change the Hiragana Yu into a conventional = sign or some 
>> latin character, most fonts are at least somehow able to put them 
>> together. Or, is there any better/alternative representation in 
>> unicode that can show japanese hiragana yu in a circle?
>>
> 
> Combining enclosing marks in principle could work with most characters,
> but in practice most arbitrary combinations do not work very well,
> because they would require very complicated font support.
> 
>> 4.In CJK Symbols and Punctuation, Proper name mark and Book name mark 
>> are not included. While there are charactera like U+2584, U+FE33, 
>> U+FE4F, and U+FE34 in unicode that is more or less a representation 
>> for the two symbol, they do not appear below or on the left of typed 
>> characters when text flow is horizontal/vertical, and instead, they 
>> occupy their own space which make them having little use in daily 
>> life, and while the proper name mark and book name mark can 
>> represented by text editing softwares and css but those representation 
>> are not ideal and they do match "Criteria for Encoding Symbols". Is it 
>> possible to make a new unicode symbol, or change some current symbol 
>> into one that could appear in suitable place of other characters when 
>> typed? And a property of the symbol is that when used in case like 美 
>> 國紐約 which 美國 and 紐約 are two different proper name (place name), 
>> so an underline should go below them without any separation between 
>> the character 美and國 or 紐and約 (when text are written horizontally), 
>> but at the same time the underline should not be linked between 國 and 
>> 紐 as 國 is the end of first place name while 紐 is the start of the 
>> other.
>>
> 
> What you are talking about is, indeed, best handled by text styling 
> attributes,
> rather than by individual character encoding. These are various CJK-specific
> underlining styles (for horizontal text layout) or sidelining styles (for
> vertical text layout). It is precisely because these require 
> highlighting for
> ranges of characters (without breaks) that this kind of text decoration is
> handled best by style attributes (or markup), rather than by individual
> combining symbols.
> 
> The characters U+FE33, U+FE34, U+FE4F (but not U+2584) are compatibility
> characters only for mapping to old Chinese standards that had individual
> characters encoded for these underlining or sidelining text highlights,
> but which required specialized text layout programs to make any use
> of them.
> 
> --Ken
>