ICU data (only .ucm mappings for now) in Rust

Fredrick Brennan copypaste at kittens.ph
Mon May 24 11:01:21 CDT 2021


Hello!

I am writing a font editor called MFEK in Rust which required me to explore the 
ICU data.

I didn't want to use libicu's C binding just to get this data, so after some trial and 
error I figured out how to include the data in a compressed form in the library, and 
added an in-memory index for it.

The crate is called icu-data: https://docs.rs/icu-data/0.1.0/icu_data/[1]

crates.io page: https://crates.io/crates/icu-data[2]

For example, if you look at "glibc-IBM437-2.1.2", you get something back that 
starts like this:

Encoding { metadata: {"mb_cur_max": "1", "mb_cur_min": "1", "subchar": "\\x1A", 
"uconv_class": "SBCS", "code_set_name": "IBM437"}, codepoints: [Codepoint { 
uni: '\u{0}', eq_type: Type0, bytestring: [0] }, Codepoint { uni: '\u{1}', eq_type: 
Type0, bytestring: [1] }, Codepoint { uni: '\u{2}', eq_type: Type0, bytestring: [2] }, 
Codepoint { uni: '\u{3}', eq_type: Type0, bytestring: [3] }, ... states: [] }

I didn't have a need for anything but the UCM charset files at this time, but if 
anyone is interested in adding a module for the other ICU data this is a place to 
start.

Best,
Fred Brennan

--------
[1] https://docs.rs/icu-data/0.1.0/icu_data/
[2] https://crates.io/crates/icu-data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210524/e187af06/attachment.htm>


More information about the Unicode mailing list