Suggested improvements for Unicode utilities

Nitai Sasson unicode.org at sl.neatnit.net
Fri May 16 04:28:55 CDT 2025


Hi, 
Following the discussion from April about arrow characters, I'm working on a draft for an upcoming email to explain the situation and proposal a lot better. It will also serve as a draft for a formal proposal later on. 

While writing it, I'm making heavy use of Unicode Utilities: https://util.unicode.org/UnicodeJsps/

I've encountered some issues, annoyances and nitpicks with these utilities that I hope can be addressed. Are they open-source? If so, I might even contribute these improvements myself.

Character Properties

This tool shows all Unicode properties for a given character: https://util.unicode.org/UnicodeJsps/character.jsp?a=0028
Clicking on a particular value (e.g. "Open" for the property Bidi_Paired_Bracket_Type) shows the set of all characters that share that property value: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Bidi_Paired_Bracket_Type=Open:]

Issues: 
1. Clicking on a property name leads nowhere, presumably after a website restructure. It links to e.g. https://util.unicode.org/UnicodeJsps/properties.jsp?a=Bidi_Class#Bidi_Class but this does not show anything related to this property.
2. Clicking on a property value that contains a single character (e.g. ")" for Bidi_Mirroring_Glyph) should open that character in the Character Properties utility, not the set of characters that share this property value (which is often just the character we came from). Or perhaps there could be a separate "inspect this character" button, so existing behavior remains the same. 
3. Missing values (null) should also be clickable to see all character which do not have a value for that property. This may not currently be supported by the UnicodeSet utility.

UnicodeSet

https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp
Issues: 
1. As just alluded, I could not find a way to find characters that don't have a property. For example, I want to find all characters with Bidi_Mirrored=Yes but without any value for Bidi_Mirroring_Glyph. Best I can tell, this is a missing feature. 
2. It's unclear what these options do: Abbreviate, Collate, UCD format, Escape, Group by, Info. I've tried to understand them by experimentation, but the only one I'm confident I understand is Group by. Could they be explained within the tool page somehow? 

BIDI (UBA)

These are two tools that do the same thing but are implemented differently and support different versions of Unicode: 
"bidi" is implemented in Java and only supports Unicode 6.2. https://util.unicode.org/UnicodeJsps/bidi.jsp
"bidi-c" is implemented in C and supports Unicode versions 6.2 through 14.0. https://util.unicode.org/UnicodeJsps/bidic.jsp

Issues: 
1. The older "bidi" version is outdated. It should more aggressively push users to switch over to the newer implementation. Right now there is some text about this but it's easy to miss.
2. The list of utilities in the top banner lists "bidi" before "bidi-c" - again, it should direct users to the more updated utility instead of the older one.
3. Similarly, all links to "bidi" from elsewhere in the Unicode website should link to "bidi-c" instead. In particular: https://www.unicode.org/reports/tr41/tr41-34.html#Demo9
4. Bidi mirroring is not displayed. Example: https://util.unicode.org/UnicodeJsps/bidic.jsp?s=%D7%90%3C%D7%91&b=2&u=140&d=2 , the rendering is "א<ב" as shown in the only box. The "Reordered Display" table should use <td dir="rtl"> for characters with odd embedding level to make them mirror appropriately. To go the extra mile, an indication can be added that a character has been mirrored in a new row to the table. I can create a mockup if this description is unclear.

Thanks, 
Nitai

(P.S. if anyone wants to see the draft I mentioned, I'd be happy to share it at this point. It's not quite ready to be emailed out here just yet.)




More information about the Unicode mailing list