Suggested improvements for Unicode utilities

Robin Leroy egg.robin.leroy at gmail.com
Fri May 16 13:49:01 CDT 2025


Le ven. 16 mai 2025 à 11:33, Nitai Sasson via Unicode <
unicode at corp.unicode.org> a écrit :

> I'm making heavy use of Unicode Utilities:
> https://util.unicode.org/UnicodeJsps/
>
> I've encountered some issues, annoyances and nitpicks with these utilities
> that I hope can be addressed. Are they open-source?

Yes, at https://github.com/unicode-org/unicodetools.


> If so, I might even contribute these improvements myself.
>
I wouldn’t recommend that.
These online utilities are deeply coupled with tooling and libraries used
to produce the Standard itself (for good reason: their primary goal is to
assist in maintenance of the Standard), some of which dates back to the
90s, and whose performance is often highly suboptimal.
The documentation is mostly in the form of oral tradition, things that look
easy generally are not, and things that look innocuous can be so slow that
ill-behaved crawlers will bring down the servers.
And the maintainers don’t really have the time to carefully review changes
made by people unfamiliar with the codebase.

1. Clicking on a property name leads nowhere, presumably after a website
> restructure. It links to e.g.
> https://util.unicode.org/UnicodeJsps/properties.jsp?a=Bidi_Class#Bidi_Class
> but this does not show anything related to this property.
>
Yes, this page was intentionally blanked, it was one of those that were so
slow as to bring down the servers. I should put something back there at
some point (though probably not exactly what used to be there; with the
expansion of the scope of the tools, that page had become somewhat unusable
regardless of performance concerns).

I’ll remove the links for now, links to a blank page aren’t really helpful…

2. Clicking on a property value that contains a single character (e.g. ")"
> for Bidi_Mirroring_Glyph) should open that character in the Character
> Properties utility, not the set of characters that share this property
> value (which is often just the character we came from). Or perhaps there
> could be a separate "inspect this character" button, so existing behavior
> remains the same.

Yes, I should do something about that, it has annoyed me repeatedly.


> 3. Missing values (null) should also be clickable to see all character
> which do not have a value for that property.

Sure, might as well.

1. As just alluded, I could not find a way to find characters that don't
> have a property. For example, I want to find all characters with
> Bidi_Mirrored=Yes but without any value for Bidi_Mirroring_Glyph. Best I
> can tell, this is a missing feature.

[ \p{Bidi_Mirrored} & \p{Bidi_Mirroring_Glyph=@none@} ]
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%20%5Cp%7BBidi_Mirrored%7D+%26+%5Cp%7BBidi_Mirroring_Glyph%3D%40none%40%7D%20%5D&g=&i=>
.

Unicode Set notation, especially with its extensions used in maintenance of
the standard, has long been under-documented, but at its latest meeting in
April the UTC has decided to address this issue by authorizing
<https://www.unicode.org/L2/L2025/25085.htm#183-C26> a Proposed Draft
Unicode Technical Standard #61, Unicode Set Notation.

On the @none@ syntax see
https://unicode.org/reports/tr61/#property-comparison,
https://unicode.org/reports/tr61/#Property-Comparisons, and
https://unicode.org/reports/tr61/#Identity-and-Null-Queries.
The review notes in that draft state that the online tools don’t support
some things; this was true when that draft was written, but in most cases
this has since been corrected.

3. Similarly, all links to "bidi" from elsewhere in the Unicode website
> should link to "bidi-c" instead. In particular:
> https://www.unicode.org/reports/tr41/tr41-34.html#Demo9

Good catch, I’ll point that out to the editor of UAX #41.


> 4. Bidi mirroring is not displayed [but it should be].
>
That seems doable.

Best regards,

Robin Leroy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250516/db777e2b/attachment.htm>


More information about the Unicode mailing list