Unicode Digest, Vol 50, Issue 20

Wed Feb 28 00:37:33 CST 2018

The OpenType spec doesn’t not in any way suggest that the bits be used that way. It’s impossible to assert that there are no applications out there that do that, but I wouldn’t expect there to be many widely-used apps that do that today.

On the other hand, something that the bits might affect are behaviours like font selection / font binding. For example, if you paste plain text into a rich-text app, it must select a default font for that text, since it’s a rich-text app. Now, an obvious choice would be to use the font applied to the characters on either side of the insertion point. But if it turned out that that font didn’t support the text being pasted, that would create a rendering problem; so the app probably wants to avoid that. An app just might use these bits as a heuristic to decide whether the current font can support the text or not.

I say that Unicode-range bits probably wouldn’t affect rendering in current apps, though that wasn’t necessarily the case in the past. Word 97 was one of the very first mainstream apps to support Unicode, but it was limited in the scripts that were actually supported. Word 2000 was still early in terms of mainstream Unicode support, and still had limitations. I recall working on font projects for Ethiopic and Yi scripts (with SIL at the time) and needing to set Unicode range or codepage bits in order to get text working in Word using our fonts One particular issue was a font-binding issue: Word would lump the Yi characters in with CJK (they’re not Western, and they’re not the few complex scripts that are supported, so assume they’re CJK), but wouldn’t allow the font to be applied until I set bits to make Word think the font supports CJK. But then with the Ethiopic font, there was a different effect — a rendering issue — that became apparent: Ethiopic characters have many different widths, but Word ignored the actual glyph metrics and displayed every glyph with the same width (the apparent assumption being that the characters are all CJK and all have the same width). Again, bits had to be set to make it observe the actual glyph metrics. IIRC, in one case I needed to set the Shift-JIS code page bit, and in the other case, to set a bit for one of the kana blocks.

But that was many years ago now. I can’t think of seeing Unicode-range bits affecting rendering in a long time.

Peter

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Neil Patel via Unicode
Sent: Tuesday, February 27, 2018 8:46 AM
To: unicode at unicode.org; unicode-request at unicode.org
Subject: Re: Unicode Digest, Vol 50, Issue 20

Does the ulUnicodeRange bits get used to dictate rendering behavior or script recognition?

I am just wondering about whether the lack of bits to indicate an Adlam charset can cause other issues in applications.

-Neil

On Sat, Feb 24, 2018 at 1:00 PM, via Unicode <unicode at unicode.org<mailto:unicode at unicode.org>> wrote:
Send Unicode mailing list submissions to
        unicode at unicode.org<mailto:unicode at unicode.org>

To subscribe or unsubscribe via the World Wide Web, visit
        http://unicode.org/mailman/listinfo/unicode<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Funicode.org%2Fmailman%2Flistinfo%2Funicode&data=04%7C01%7Cpetercon%40microsoft.com%7Cd33f1512e3cb480a15c008d57e02b5ea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636553472482173590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=AN3NivzghKJ0RdryVYIMg4x90UimopMtJyj2Xox4vvg%3D&reserved=0>
or, via email, send a message with subject or body 'help' to
        unicode-request at unicode.org<mailto:unicode-request at unicode.org>

You can reach the person managing the list at
        unicode-owner at unicode.org<mailto:unicode-owner at unicode.org>

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Unicode digest..."

Today's Topics:

   1. Re: metric for block coverage (Norbert Lindenberg via Unicode)

---------- Forwarded message ----------
From: Norbert Lindenberg via Unicode <unicode at unicode.org<mailto:unicode at unicode.org>>
To: Khaled Hosny <khaledhosny at eglug.org<mailto:khaledhosny at eglug.org>>
Cc: James Kass <jameskasskrv at gmail.com<mailto:jameskasskrv at gmail.com>>, Adam Borowski <kilobyte at angband.pl<mailto:kilobyte at angband.pl>>, Unicode Public <unicode at unicode.org<mailto:unicode at unicode.org>>, Norbert Lindenberg <unicode at lindenbergsoftware.com<mailto:unicode at lindenbergsoftware.com>>
Bcc:
Date: Fri, 23 Feb 2018 10:15:32 -0800
Subject: Re: metric for block coverage

> On Feb 18, 2018, at 3:26 , Khaled Hosny via Unicode <unicode at unicode.org<mailto:unicode at unicode.org>> wrote:
>
> On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass via Unicode wrote:
>> Adam Borowski wrote,
>>
>>> I'm looking for a way to determine a font's coverage of available scripts.
>>> It's probably reasonable to do this per Unicode block.  Also, it's a safe
>>> assumption that a font which doesn't know a codepoint can do no complex
>>> shaping of such a glyph, thus looking at just codepoints should be adequate
>>> for our purposes.
>>
>> You probably already know that basic script coverage information is
>> stored internally in OpenType fonts in the OS/2 table.
>>
>> https://docs.microsoft.com/en-us/typography/opentype/spec/os2<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Ftypography%2Fopentype%2Fspec%2Fos2&data=04%7C01%7Cpetercon%40microsoft.com%7Cd33f1512e3cb480a15c008d57e02b5ea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636553472482173590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=yWOSygwTOHSaBA%2BIWTGoF0OL6ucmxTJf0KSiXMpcuzg%3D&reserved=0>
>>
>> Parsing the bits in the "ulUnicodeRange..." entries may be the
>> simplest way to get basic script coverage info.
>
> Though this might not be very reliable since OpenType does not have a
> definition of what it means for a Unicode block to be supported; some
> font authoring tools use a percentage, others use the presence of any
> characters in the range, and fonts might even provide incorrect data for
> any reason.
>
> However, I don’t think script or block coverage is that useful, what
> users are usually interested in is the language coverage.
>
> Regards,
> Khaled

All true. In addition, ulUnicodeRange ran out of bits around Unicode 5.1, so scripts/blocks added to Unicode after that, such as Javanese, Tangut, or Adlam, cannot be represented.

Norbert

_______________________________________________
Unicode mailing list
Unicode at unicode.org<mailto:Unicode at unicode.org>
http://unicode.org/mailman/listinfo/unicode<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Funicode.org%2Fmailman%2Flistinfo%2Funicode&data=04%7C01%7Cpetercon%40microsoft.com%7Cd33f1512e3cb480a15c008d57e02b5ea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636553472482173590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=AN3NivzghKJ0RdryVYIMg4x90UimopMtJyj2Xox4vvg%3D&reserved=0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180228/61b23330/attachment.html>