Definition of Values of Property Vertical_Orientation

Richard Wordingham richard.wordingham at ntlworld.com
Mon Aug 22 17:45:38 CDT 2022


On Sun, 21 Aug 2022 15:27:16 -0700
Markus Scherer via Unicode <unicode at corp.unicode.org> wrote:

> On Sun, Aug 21, 2022 at 7:24 AM Richard Wordingham via Unicode <
> unicode at corp.unicode.org> wrote:  
> 
> > I've just spent a painful time verifying the loading of the values
> > of Vertical_Orientation.  After the list of codepoints and ranges
> > in the comments of VerticalOrientation.txt for which the value
> > defaults to Upright, is there any reason for having the ominous
> > wording
> >
> > "All other code points, assigned and unassigned, that are not listed
> > explicitly in the data section of this file are given the value R."
> >
> > Given the current (Version 14.0) and candidate (Version 15.0) data
> > sections, is there any reason for not having the more reassuring
> >
> > "All code points, assigned and unassigned, that are not listed
> > explicitly in the data section of this file are given the value R."
> >  
> 
> sgtm
> 
> One could then set up the default value of the property as Rotated and
> > then just read in the data section as overrides, as with other files
> > just defining the value of one enumeration property.  
> 
> 
> You can do that today.

The description in the file gives no assurance of that.  I did,
however, find the necessary assurance in UAX #44 Revision 28 Section
4.2.9.1.

> As things stand,
> > loading the property values into an application involves three
> > steps:
> >
> > 1) Set up the default value.
> >  
> 
> Which you can also read from the @missing line.
> 
> # @missing: 0000..10FFFF; R
> 
> https://www.unicode.org/reports/tr44/#Missing_Conventions

But "U+0023 NUMBER SIGN ("#") is used to indicate comments: all
characters from the number sign to the end of the line are considered
part of the comment, and are disregarded when parsing data."
and "The comments are purely informational, and may change format or be
omitted in the future. They should not be parsed for
content."!(Revision 28 Section 4.2.4).

I think something needs to be added at the start of Section 4.2.4 to say
that a line starting U+0023, U+0020, U+0040 is exceptionally *not* a
comment line.

> 2) Set up the default values for the Upright regions listed in the
> > comments.
> > 3) Set up the explicit values from the data file.
> >
> > Given the current explicit data, Step 2 is redundant.
> >  
> 
> Right. The comments document which ranges default to Upright, but the
> unassigned and private use code points that have that value are also
> explicitly listed.
> 
> We intend, for some version after 15, to add additional @missing
> lines in this file so that we no longer need to set those
> not-assigned code points to U, but either way you can just parse the
> file without hardcoding assumptions.
> (Unicode 15 has three files with multiple @missing lines.)

So long as the @missing lines are not commented out!

Richard.


More information about the Unicode mailing list